You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(legoktm: deleted education@ from MM3, didn't import properly)
imported>Stashbot
(cwhite: end codfw opensearch upgrade T288621)
 
(189 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2021-05-07 ==
== 2021-12-07 ==
* 21:40 legoktm: deleted education@ from MM3, didn't import properly
* 00:10 cwhite: end codfw opensearch upgrade [[phab:T288621|T288621]]
* 21:35 legoktm: deleted festivalsommer-teilnehmer from MM3, didn't import properly
* 21:33 legoktm: fixed owner for wdqs-gui-build list
* 19:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:55 legoktm: deleted daily-article-l from mailman3 after failed import
* 18:33 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
* 18:28 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
* 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
* 18:23 brennen: 1.37.0-wmf.4 train status ([[phab:T281145|T281145]]): blockers appear resolved, going ahead in the interest of not having a split deploy over weekend
* 17:50 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/cache/LinkBatch.php: Backport: [[gerrit:685901{{!}}LinkBatch: skip bad input (T282180 T282070)]] (duration: 01m 06s)
* 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev (duration: 01m 55s)
* 17:23 andrew@deploy1002: Started deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev
* 15:10 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 24s)
* 15:08 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:03 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 11s)
* 15:02 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:02 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 26s)
* 15:00 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:00 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 29s)
* 14:58 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:57 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 22s)
* 14:56 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:41 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
* 14:40 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 19s)
* 14:38 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:38 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 00m 50s)
* 14:37 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 13:04 Urbanecm: Start server-side upload for 1 video file ([[phab:T281927|T281927]])
* 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15856 and previous config saved to /var/cache/conftool/dbconfig/20210507-121908-kormat.json
* 12:04 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15855 and previous config saved to /var/cache/conftool/dbconfig/20210507-120404-kormat.json
* 11:49 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15854 and previous config saved to /var/cache/conftool/dbconfig/20210507-114859-kormat.json
* 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15853 and previous config saved to /var/cache/conftool/dbconfig/20210507-113355-kormat.json
* 09:55 dcausse: depooling wdqs1012 [[phab:T280382|T280382]], [[phab:T282222|T282222]]
* 09:44 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@codfw - [[phab:T281673|T281673]]
* 08:50 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
* 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 08:15 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqsin - [[phab:T281673|T281673]]
* 08:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15849 and previous config saved to /var/cache/conftool/dbconfig/20210507-074725-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15848 and previous config saved to /var/cache/conftool/dbconfig/20210507-073222-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15847 and previous config saved to /var/cache/conftool/dbconfig/20210507-071718-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15846 and previous config saved to /var/cache/conftool/dbconfig/20210507-070214-root.json
* 06:17 marostegui: Deploy schema change on s2 codfw, lag will appear [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:11 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN [[phab:T282122|T282122]] (duration: 01m 10s)
* 06:09 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN [[phab:T282122|T282122]] (duration: 01m 06s)
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for schema change', diff saved to https://phabricator.wikimedia.org/P15845 and previous config saved to /var/cache/conftool/dbconfig/20210507-055425-marostegui.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15844 and previous config saved to /var/cache/conftool/dbconfig/20210507-055350-root.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15842 and previous config saved to /var/cache/conftool/dbconfig/20210507-053847-root.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15841 and previous config saved to /var/cache/conftool/dbconfig/20210507-052343-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T282093|T282093]]', diff saved to https://phabricator.wikimedia.org/P15840 and previous config saved to /var/cache/conftool/dbconfig/20210507-051519-marostegui.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15839 and previous config saved to /var/cache/conftool/dbconfig/20210507-050839-root.json
* 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P15837 and previous config saved to /var/cache/conftool/dbconfig/20210507-043350-marostegui.json


== 2021-05-06 ==
== 2021-12-06 ==
* 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 ([[phab:T282193|T282193]])
* 22:19 mstyles@deploy1002: Synchronized php-1.38.0-wmf.9/includes/content/ContentModelChange.php: Deploy security patch for [[phab:T271037|T271037]] (duration: 00m 56s)
* 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 ([[phab:T282092|T282092]])
* 20:14 cwhite: begin codfw opensearch upgrade [[phab:T288621|T288621]]
* 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: [[gerrit:685890{{!}}Reorder tables in SpecialWatchlist (T282181)]] (duration: 00m 57s)
* 20:14 cwhite: begin codfw opensearch upgrade [[phab:T288612|T288612]]
* 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 ([[phab:T282092|T282092]])
* 19:58 legoktm: trying new dump of Special:CodeReview on mwmaint1002 ([[phab:T205361|T205361]])
* 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o ([[phab:T282092|T282092]])
* 19:26 legoktm: installing php-yaml on all appservers
* 21:11 hashar: restarted CI Jenkins due to [[phab:T281737|T281737]]
* 19:08 damilare: updated civicrm from {{Gerrit|b82183b9}} to {{Gerrit|311382de}}
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
* 19:04 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742835{{!}}bnwikibooks: add autopatrolled and patroller user groups (T296640)]] (duration: 00m 56s)
* 19:04 ejegg: updated fundraising CiviCRM from {{Gerrit|8034e47008}} to {{Gerrit|2052d79248}}
* 19:03 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1028.eqiad.wmnet with OS buster
* 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:685906{{!}}Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140)]] (duration: 01m 04s)
* 19:02 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|338d1df5903cdc963b9eef22ec2c1750b7b3a02b}}: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases ([[phab:T282160|T282160]]) (duration: 01m 05s)
* 19:02 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1028.eqiad.wmnet with OS buster
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|7e21cf0d96541d0ab5cb18cd7741756ab1dfe7b8}}: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases ([[phab:T282160|T282160]]) (duration: 01m 04s)
* 19:00 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - [[phab:T282140|T282140]] (duration: 01m 06s)
* 18:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
* 18:45 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
* 18:43 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
* 18:34 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
* 18:00 majavah: "foreachwiki namespaceDupes.php --fix {{!}} tee namespaceDupes-[[phab:T293839|T293839]]-fix.txt" FINISHED about 15 minutes ago [[phab:T293839|T293839]]
* 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:27 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T296897|T296897]] Move cirrus traffic to codfw (duration: 00m 56s)
* 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:24 majavah: starting "foreachwiki namespaceDupes.php --fix {{!}} tee namespaceDupes-[[phab:T293839|T293839]]-fix.txt" in mwmaint1002 screen session, [[phab:T293839|T293839]]
* 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
* 15:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2012.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2012.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 17:15 volans: upgrade spicerack on cumin* to 0.0.52
* 14:45 elukey: roll restart of nfacctd on netflow* nodes to pick up the new CA bundle for librdkafka
* 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
* 14:19 moritzm: draining primary/secondary instances off ganeti2012 [[phab:T296622|T296622]]
* 17:13 papaul: powerdown ms-be2057 for relocation
* 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2016.codfw.wmnet with OS buster
* 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4d8a75d5f01e8e2cf724e19db2e9bcc12fb8f5f4}}: Deploy Growth features on zhwiki in dark mode ([[phab:T287884|T287884]]) (duration: 00m 56s)
* 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=zhwiki --phab=[[phab:T287884|T287884]]
* 17:00 papaul: powerdown elastic2058 for relocation
* 13:52 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki growthexperiments # [[phab:T287884|T287884]]
* 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - [[phab:T281673|T281673]]
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2016.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 16:12 papaul: powerdown mc-gp2002 for relocation
* 13:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2016.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
* 13:30 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:58 Amir1: starting upgrade of public mailing lists in group d and e ([[phab:T280322|T280322]])
* 13:25 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
* 13:03 majavah: $ mwscript namespaceDupes.php --wiki barwiki --fix --add-prefix=BROKEN # [[phab:T293839|T293839]]
* 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
* 12:58 majavah: mwscript namespaceDupes.php --wiki skwiki --fix --add-prefix=BROKEN # [[phab:T293839|T293839]]
* 15:42 papaul: powerdown logstash2027 for relocation
* 12:54 majavah: mwscript namespaceDupes.php --wiki skwiki --fix # [[phab:T293839|T293839]]
* 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2011.codfw.wmnet with reason: readding to cluster after reimage
* 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 12:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2011.codfw.wmnet with reason: readding to cluster after reimage
* 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
* 12:48 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734383{{!}}Set default two-letter NS_PROJECT aliases (T293839)]] (duration: 00m 55s)
* 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 12:41 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:743533{{!}}Enable Autopatroller level page protection for English Wiktionary (T296580)]] (duration: 00m 56s)
* 15:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 12:28 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:743529{{!}}Enable SandboxLink extension for bnwikivoyage (T296637)]] (duration: 00m 55s)
* 15:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 12:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:743528{{!}}Enable groups autopatrolled and patroller for bnwikivoyage (T296637)]] (duration: 00m 56s)
* 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 12:15 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:743158{{!}}Enable SectionTranslation in Malayalam, Malay, Azerbaijani, Tamil, Bashkir and Albanian WPs (T285842)]] (duration: 00m 56s)
* 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
* 12:08 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742833{{!}}hewiki: add "templateeditor" permission group (T296769)]] (duration: 00m 57s)
* 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
* 15:26 ryankemper: [[phab:T280382|T280382]] [WDQS] Pooled `wdqs1007` and `wdqs2004`
* 11:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
* 15:26 ryankemper: [[phab:T280382|T280382]] `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 11:28 Amir1: dropping wikiadmin@localhost from all of s3 ([[phab:T296511|T296511]])
* 15:26 ryankemper: [[phab:T280382|T280382]] `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 11:21 Amir1: dropping wikiadmin@localhost from all of s2 ([[phab:T296511|T296511]])
* 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:12 moritzm: draining primary/secondary instances off ganeti2016 [[phab:T296622|T296622]]
* 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: switch to drbd storage
* 15:14 papaul: powerdown ms-be2053 for relocation
* 10:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: switch to drbd storage
* 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
* 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2011.codfw.wmnet with OS buster
* 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: [[phab:T270704|T270704]]
* 10:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: [[phab:T270704|T270704]]
* 10:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: [[phab:T270704|T270704]]
* 10:23 moritzm: draining primary/secondary instances off ganeti2015 [[phab:T296622|T296622]]
* 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: [[phab:T270704|T270704]]
* 09:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS buster
* 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:09 elukey: move kafka main codfw to fixed uid/gid for the kafka user (requires a stop/start of all daemons) - [[phab:T296982|T296982]]
* 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
* 08:13 moritzm: installing remaining icu security updates on buster
* 14:55 papaul: powerdown kafka-main2002 for relocation
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
* 13:21 XioNoX: push pfw policies - [[phab:T281942|T281942]]
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
* 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
* 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: [[gerrit:685752{{!}}Enable Extension:MediaSearch on betacommons (T265939)]] (duration: 01m 06s)
* 11:34 mlitn@deploy1002: sync-file aborted: Config: [[gerrit:685752{{!}}Enable Extension:MediaSearch on betacommons (T265939)]] (duration: 00m 56s)
* 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
* 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
* 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
* 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:685554{{!}}Enable ReferencePreviews as full default on pilot wikis (T271206)]] (duration: 01m 06s)
* 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:685554{{!}}Enable ReferencePreviews as full default on pilot wikis (T271206)]] (duration: 01m 06s)
* 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
* 11:12 kormat: reimaging db1173 to buster [[phab:T280751|T280751]]
* 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
* 10:19 jynus: stop dbprov2002 in advance of maintenance [[phab:T281135|T281135]]
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
* 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
* 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
* 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
* 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye [[phab:T275873|T275873]]
* 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
* 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
* 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
* 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
* 07:47 jynus: shutting down and removing db2098:s3 instance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
* 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
* 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - [[phab:T281673|T281673]]
* 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 07:24 moritzm: installing exim security updates on bullseye hosts
* 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
* 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
* 06:01 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 06:00 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 [[phab:T281445|T281445]]', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
* 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
* 05:38 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 05:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing [[phab:T282070|T282070]]  RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
* 05:27 effie: upgrade scap to 3.17.1-1 - [[phab:T279695|T279695]]
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
* 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
* 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
* 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
* 03:38 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:38 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
* 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": null,"_name": null}'`}}
* 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
* 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
* 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
* 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 00:35 Amir1: sudo service mailman3-web restart


== 2021-05-05 ==
== 2021-12-04 ==
* 23:35 ryankemper: [[phab:T281621|T281621]] [[phab:T281327|T281327]] [Elastic] Banned `elastic2033` and `elastic2043` from the Cirrussearch Elasticsearch clusters
* 01:14 mutante: mx2001 - did not come back from reboot, did not get IP on interface, could not start ferm, logged in via console with root password, in /etc/network/interfaces replaced all "ens5" with "ens13", rebooted again, selected previous kernel version
* 23:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GlobalWatchlist/modules/SpecialGlobalWatchlist.display.css: {{Gerrit|4947241f876234aabc578409c3691fb791c8f715}}: Fix centering of as-of label (duration: 01m 08s)
* 00:54 mutante: rebooting mx2001
* 22:13 mutante: welcome new deployer derick - user created on deploy1002 and bastions ([[phab:T281564|T281564]])
* 00:31 jynus: manually restarting clamav on otrs1001 after being killed
* 22:05 mutante: pushing puppet run on all bastion hosts
* 21:45 mutante: mailing lists: approved Alangi Derick's pending request for membership in ops mailing list (is becoming deployer) [[phab:T281309|T281309]]
* 21:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|52b134ed84c1c8ef5fcd6927f03567879553d31c}}: Cross-wiki block should pass correct wiki blocker ([[phab:T281972|T281972]]) (duration: 01m 09s)
* 21:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|6526884848d0bb88c83cec2c6b39461542e21ef6}}: Cross-wiki block should pass correct wiki blocker ([[phab:T281972|T281972]]) (duration: 01m 08s)
* 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/user/UserIdentityValue.php: {{Gerrit|f189c4627cfc692fb743160030a5e5ab92df1485}}: UserIdentityValue: Introduce convenience static factory methods ([[phab:T281972|T281972]]) (duration: 01m 09s)
* 21:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/includes/user/UserIdentityValue.php: {{Gerrit|8ffb52d5cad9e003696200b9cd3e957ab26bc868}}: UserIdentityValue: Introduce convenience static factory methods ([[phab:T281972|T281972]]) (duration: 01m 11s)
* 21:29 urbanecm@deploy1002: sync-file aborted: {{Gerrit|8ffb52d5cad9e003696200b9cd3e957ab26bc868}}: UserIdentityValue: Introduce convenience static factory methods ([[phab:T281972|T281972]]) (duration: 00m 04s)
* 20:37 ejegg: updated email preferences wiki (donorwiki) from {{Gerrit|d449599540}} to {{Gerrit|9f51ace546}}
* 20:36 ejegg: updated payments-wiki from {{Gerrit|d449599540}} to {{Gerrit|9f51ace546}}
* 20:20 ejegg: updated email preferences wiki (donorwiki) from {{Gerrit|a232fc3438}} to {{Gerrit|d449599540}}
* 19:59 jbond42: re-enable puppet post 685485
* 19:53 jbond42: disable puppet: rolling out change (685485) which affects all hosts
* 19:21 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
* 19:19 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
* 19:16 jbond42: ignore the last log message will wait for deploy to finish
* 19:16 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/tests/phpunit/includes: Backport: [[gerrit:685480{{!}}Fix order of joins in SpecialRecentChanges (T281981)]] (duration: 01m 10s)
* 19:16 jbond42: disable puppet: rolling out change (685485) which affects all hosts
* 19:14 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials: Backport: [[gerrit:685480{{!}}Fix order of joins in SpecialRecentChanges (T281981)]] (duration: 01m 08s)
* 19:10 Amir1: starting migration of public mailing lists in group b and c to mailman3 ([[phab:T280322|T280322]])
* 19:01 brennen: 1.37.0-wmf.4 train status ([[phab:T281145|T281145]]): deploying patch for [[phab:T282038|T282038]] and then rolling forward to group1.
* 18:59 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[46].eqsin.wmnet
* 18:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[35].eqsin.wmnet
* 18:43 tgr_: Morning deploys done
* 18:43 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: [[gerrit:685482{{!}}Prevent edit notices from appearing (T281960)]] (duration: 01m 08s)
* 18:42 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: [[gerrit:685483{{!}}Prevent edit notices from appearing (T281960)]] (duration: 01m 08s)
* 18:40 tgr@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:679938{{!}}flaggedrevs.php: Use MediaWikiServices, not an extension function]] (duration: 01m 08s)
* 18:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/Popups/includes: Backport: [[gerrit:685478{{!}}Enable Reference Previews for more users (T271206)]] (duration: 01m 08s)
* 18:33 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/includes: Backport: [[gerrit:685477{{!}}Enable Reference Previews for more users (T271206)]] (duration: 01m 11s)
* 18:24 tgr@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:677002{{!}}replace mwlog1001 with new mwlog[12]002 hosts (T224565)]] (duration: 01m 24s)
* 17:59 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp501[3456].eqsin.wmnet,service=ats-be
* 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=ats-tls
* 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=varnish-fe
* 17:59 mutante: adding a systemd timer to all thumbor servers that writes output of fc-list command into /srv/fc-list/fc-list ([[phab:T280718|T280718]])
* 17:58 XioNoX: push pfw policies - [[phab:T281942|T281942]]
* 17:10 ejegg: updated standalone SmashPig deploy from {{Gerrit|250a8570d1}} to {{Gerrit|be272c02ce}}
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15786 and previous config saved to /var/cache/conftool/dbconfig/20210505-155453-root.json
* 15:43 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga2001.wikimedia.org
* 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15785 and previous config saved to /var/cache/conftool/dbconfig/20210505-153949-root.json
* 15:25 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga2001.wikimedia.org
* 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15784 and previous config saved to /var/cache/conftool/dbconfig/20210505-152445-root.json
* 15:23 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga1001.wikimedia.org
* 15:11 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga1001.wikimedia.org
* 15:10 herron: decommissioning icinga[12]001 hosts [[phab:T279601|T279601]] [[phab:T279602|T279602]]
* 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 [[phab:T280751|T280751]]
* 15:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 [[phab:T280751|T280751]]
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 30%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15783 and previous config saved to /var/cache/conftool/dbconfig/20210505-150942-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 20%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15782 and previous config saved to /var/cache/conftool/dbconfig/20210505-145438-root.json
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15781 and previous config saved to /var/cache/conftool/dbconfig/20210505-144431-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15780 and previous config saved to /var/cache/conftool/dbconfig/20210505-143934-root.json
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15779 and previous config saved to /var/cache/conftool/dbconfig/20210505-142927-root.json
* 14:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Reimage db2129 [[phab:T280751|T280751]]
* 14:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Reimage db2129 [[phab:T280751|T280751]]
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15778 and previous config saved to /var/cache/conftool/dbconfig/20210505-142431-root.json
* 14:19 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
* 14:18 marostegui: Upgrade kernel and enable report_host on db1126
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to enable report_host', diff saved to https://phabricator.wikimedia.org/P15777 and previous config saved to /var/cache/conftool/dbconfig/20210505-141735-marostegui.json
* 14:17 kormat@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15776 and previous config saved to /var/cache/conftool/dbconfig/20210505-141423-root.json
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15775 and previous config saved to /var/cache/conftool/dbconfig/20210505-135920-root.json
* 13:58 kevinbazira@deploy1002: Finished deploy [ores/deploy@5612f30]: Regular ORES Deployment [[phab:T278723|T278723]] (duration: 16m 47s)
* 13:48 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:685062{{!}}Revert "Enable ReferencePreviews on first wikis CommonSettings" ()]] (duration: 02m 08s)
* 13:41 kevinbazira@deploy1002: Started deploy [ores/deploy@5612f30]: Regular ORES Deployment [[phab:T278723|T278723]]
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P15774 and previous config saved to /var/cache/conftool/dbconfig/20210505-133259-marostegui.json
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15773 and previous config saved to /var/cache/conftool/dbconfig/20210505-133202-root.json
* 13:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Reimage db2129 [[phab:T280751|T280751]]
* 13:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Reimage db2129 [[phab:T280751|T280751]]
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15772 and previous config saved to /var/cache/conftool/dbconfig/20210505-131658-root.json
* 13:12 kormat: reimaging db2129 to buster [[phab:T280751|T280751]]
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15771 and previous config saved to /var/cache/conftool/dbconfig/20210505-130155-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15770 and previous config saved to /var/cache/conftool/dbconfig/20210505-124651-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P15769 and previous config saved to /var/cache/conftool/dbconfig/20210505-122351-marostegui.json
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15768 and previous config saved to /var/cache/conftool/dbconfig/20210505-121353-root.json
* 12:01 moritzm: installing exim security updates on stretch
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15767 and previous config saved to /var/cache/conftool/dbconfig/20210505-115849-root.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15765 and previous config saved to /var/cache/conftool/dbconfig/20210505-114345-root.json
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15764 and previous config saved to /var/cache/conftool/dbconfig/20210505-112842-root.json
* 11:25 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|3565427dcd80e78352c99eb322de3318ae89a4ee}}: Enable ReferencePreviews on first wikis ([[phab:T271206|T271206]]; 2/2) (duration: 01m 10s)
* 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4f3051bf286b89e47ef153532de76756f2e7ade9}}: Enable ReferencePreviews on first wikis ([[phab:T271206|T271206]]; 1/2) (duration: 01m 20s)
* 11:17 urbanecm@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|289dc34feeb0703bb45f4a71c149cd607ef26455}}: Enable new language button for all logged in users outside test projects ([[phab:T280526|T280526]]) (duration: 02m 24s)
* 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 09:54 hashar: Restarted Zuul / CI
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15762 and previous config saved to /var/cache/conftool/dbconfig/20210505-094945-root.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15761 and previous config saved to /var/cache/conftool/dbconfig/20210505-094005-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15760 and previous config saved to /var/cache/conftool/dbconfig/20210505-093441-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 80%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15759 and previous config saved to /var/cache/conftool/dbconfig/20210505-092501-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15758 and previous config saved to /var/cache/conftool/dbconfig/20210505-091938-root.json
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 70%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15757 and previous config saved to /var/cache/conftool/dbconfig/20210505-090957-root.json
* 09:08 hashar: Upgraded Jenkins ldap plugin from 1.26 to 2.6 # [[phab:T281737|T281737]]
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15756 and previous config saved to /var/cache/conftool/dbconfig/20210505-090434-root.json
* 08:55 hashar: Restarting CI Jenkins # [[phab:T281737|T281737]]
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15755 and previous config saved to /var/cache/conftool/dbconfig/20210505-085454-root.json
* 08:50 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:47 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15754 and previous config saved to /var/cache/conftool/dbconfig/20210505-083950-root.json
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P15753 and previous config saved to /var/cache/conftool/dbconfig/20210505-083810-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P15752 and previous config saved to /var/cache/conftool/dbconfig/20210505-082609-marostegui.json
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 35%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15751 and previous config saved to /var/cache/conftool/dbconfig/20210505-082446-root.json
* 08:13 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org buster-wikimedia
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15750 and previous config saved to /var/cache/conftool/dbconfig/20210505-080942-root.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15749 and previous config saved to /var/cache/conftool/dbconfig/20210505-075438-root.json
* 07:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15748 and previous config saved to /var/cache/conftool/dbconfig/20210505-073934-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15747 and previous config saved to /var/cache/conftool/dbconfig/20210505-073722-marostegui.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15746 and previous config saved to /var/cache/conftool/dbconfig/20210505-073653-root.json
* 07:35 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 07:35 moritzm: rolling restart of cassandra in eqiad to pick up Java security updates
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15745 and previous config saved to /var/cache/conftool/dbconfig/20210505-073416-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15744 and previous config saved to /var/cache/conftool/dbconfig/20210505-073223-root.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 15%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15743 and previous config saved to /var/cache/conftool/dbconfig/20210505-072431-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15742 and previous config saved to /var/cache/conftool/dbconfig/20210505-072149-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15741 and previous config saved to /var/cache/conftool/dbconfig/20210505-071912-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15740 and previous config saved to /var/cache/conftool/dbconfig/20210505-071720-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 [[phab:T281794|T281794]]', diff saved to https://phabricator.wikimedia.org/P15739 and previous config saved to /var/cache/conftool/dbconfig/20210505-071132-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15738 and previous config saved to /var/cache/conftool/dbconfig/20210505-070927-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15737 and previous config saved to /var/cache/conftool/dbconfig/20210505-070646-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15736 and previous config saved to /var/cache/conftool/dbconfig/20210505-070409-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15735 and previous config saved to /var/cache/conftool/dbconfig/20210505-070216-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15734 and previous config saved to /var/cache/conftool/dbconfig/20210505-065423-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15733 and previous config saved to /var/cache/conftool/dbconfig/20210505-065142-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15732 and previous config saved to /var/cache/conftool/dbconfig/20210505-064905-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15731 and previous config saved to /var/cache/conftool/dbconfig/20210505-064712-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json
* 06:41 marostegui: Check tables on db1112 (lag might show up on s3 on wiki replicas) [[phab:T280492|T280492]]
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15729 and previous config saved to /var/cache/conftool/dbconfig/20210505-063920-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15728 and previous config saved to /var/cache/conftool/dbconfig/20210505-062416-root.json
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15727 and previous config saved to /var/cache/conftool/dbconfig/20210505-060912-root.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1178 into dbctl [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15726 and previous config saved to /var/cache/conftool/dbconfig/20210505-060814-marostegui.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from API', diff saved to https://phabricator.wikimedia.org/P15725 and previous config saved to /var/cache/conftool/dbconfig/20210505-060636-marostegui.json
* 06:00 marostegui: Restart mysqld on x1 database primary master (db1103) [[phab:T281212|T281212]]
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311 into main traffic', diff saved to https://phabricator.wikimedia.org/P15724 and previous config saved to /var/cache/conftool/dbconfig/20210505-053841-marostegui.json
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 into s1 vslow, remove db1099:3311', diff saved to https://phabricator.wikimedia.org/P15723 and previous config saved to /var/cache/conftool/dbconfig/20210505-053211-marostegui.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15722 and previous config saved to /var/cache/conftool/dbconfig/20210505-052943-marostegui.json
* 04:53 eileen: civicrm revision changed from {{Gerrit|e7c610fd87}} to {{Gerrit|8034e47008}}, config revision is {{Gerrit|189788d452}}
* 03:58 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:56 ryankemper: [[phab:T280563|T280563]] Reboot of `eqiad` complete. Only ~half of `codfw` is remaining.
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:54 ryankemper: [[phab:T280382|T280382]] `wdqs1011.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
* 03:52 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:51 ryankemper: [[phab:T280382|T280382]] [WDQS] `ryankemper@wdqs2007:~$ sudo depool` (need to monitor host to see if it becomes ssh unreachable again or if it was a one-off; also high update lag)
* 03:50 ryankemper: [[phab:T280382|T280382]] `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
* 03:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 03:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 01:55 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043` from cluster
* 01:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:49 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` (will likely fail due to underlying hw but we'll see)
* 01:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 01:45 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 01:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:43 ryankemper: [[phab:T280382|T280382]] [WDQS] `racadm>>racadm serveraction powercycle` on `wdqs2007`
* 01:39 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 01:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 00:29 eileen: civicrm revision changed from {{Gerrit|94e321dbe0}} to {{Gerrit|e7c610fd87}}, config revision is {{Gerrit|189788d452}}
* 00:15 ejegg: updated payments-wiki from {{Gerrit|44570561f2}} to {{Gerrit|d449599540}}
* 00:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f6ea8c0e5a4dc667969f5847207902727625bbe}}: Growth: enwiki: Add list of mentors ([[phab:T281896|T281896]]) (duration: 01m 10s)
* 00:00 urbanecm@deploy1002: Synchronized fc-list: {{Gerrit|93970496da7678d896b7f812b3bb5f4cf0b691ad}}: update fc-list to current version on buster ([[phab:T79424|T79424]]) (duration: 01m 09s)


== 2021-05-04 ==
== 2021-12-03 ==
* 23:41 urbanecm@deploy1002: Synchronized wmf-config/config/enwiki.yaml: {{Gerrit|d29dbb2f435afe64f2fee15b430ee04d5d13c8d7}}: Enable Growth features on enwiki in the dark mode ([[phab:T281896|T281896]]; 3/3) (duration: 01m 09s)
* 20:29 cstone: revision changed from {{Gerrit|2c2e22cd}} to {{Gerrit|b82183b9}}
* 23:40 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|d29dbb2f435afe64f2fee15b430ee04d5d13c8d7}}: Enable Growth features on enwiki in the dark mode ([[phab:T281896|T281896]]; 2/3) (duration: 01m 09s)
* 17:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d29dbb2f435afe64f2fee15b430ee04d5d13c8d7}}: Enable Growth features on enwiki in the dark mode ([[phab:T281896|T281896]]; 1/3) (duration: 01m 09s)
* 17:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:31 urbanecm@deploy1002: Synchronized wmf-config/config/bgwiki.yaml: {{Gerrit|5b4c516a1d0461065e27cacec5d2b1cb315a2c07}}: Enable Growth team features in dark mode on bgwiki ([[phab:T280824|T280824]]; 3/3) (duration: 01m 09s)
* 17:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:30 urbanecm@deploy1002: sync-file aborted: {{Gerrit|5b4c516a1d0461065e27cacec5d2b1cb315a2c07}}: Enable Growth team features in dark mode on bgwiki ([[phab:T280824|T280824]]; 3/3) (duration: 00m 03s)
* 17:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:30 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|5b4c516a1d0461065e27cacec5d2b1cb315a2c07}}: Enable Growth team features in dark mode on bgwiki ([[phab:T280824|T280824]]; 2/3) (duration: 01m 09s)
* 17:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5b4c516a1d0461065e27cacec5d2b1cb315a2c07}}: Enable Growth team features in dark mode on bgwiki ([[phab:T280824|T280824]]; 1/3) (duration: 01m 09s)
* 17:35 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 23:26 Urbanecm: Create tables for GrowthExperiments extension on enwiki ([[phab:T281896|T281896]])
* 17:22 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 23:24 Urbanecm: Create tables for GrowthExperiments extension on bgwiki ([[phab:T280824|T280824]])
* 16:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|a3c24f322b754c9a94c260ee5df4b5ae4de27f22}}: Avoid using User::getGroups() and ::getEffectiveGroups() ([[phab:T281823|T281823]]) (duration: 01m 10s)
* 16:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e467d92e5e257a3d2f9b05692db9accdd86ddb00}}: Add extendedconfirmed on ptwiki ([[phab:T281926|T281926]]) (duration: 01m 10s)
* 16:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 23:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|012d6138741ea76c985453428111aeddfdec2271}}: Add extendedconfirmed on azwiki ([[phab:T281860|T281860]]) (duration: 01m 10s)
* 16:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 22:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 16:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 22:47 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 16:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 22:46 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 22:44 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 14:25 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner2001.codfw.wmnet
* 22:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 14:10 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner2001.codfw.wmnet
* 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 12:53 moritzm: installing nss security updates on stretch
* 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 12:37 moritzm: draining primary/secondary instances off ganeti2007 [[phab:T296622|T296622]]
* 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 12:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2022.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 21:30 eileen: civicrm revision changed from {{Gerrit|33a63d5789}} to {{Gerrit|94e321dbe0}}, config revision is {{Gerrit|a212d6ab23}}
* 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2022.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 21:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4 (duration: 03m 55s)
* 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
* 21:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4
* 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
* 20:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2022.codfw.wmnet with OS buster
* 20:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2022.codfw.wmnet with OS buster
* 20:09 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7] (duration: 05m 16s)
* 11:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2011.codfw.wmnet with OS buster
* 20:04 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7]
* 11:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS buster
* 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7] (duration: 00m 07s)
* 11:06 jynus: stop and shutdown db1102 [[phab:T296546|T296546]]
* 20:03 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7]
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7] (duration: 17m 15s)
* 11:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 19:46 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7]
* 09:38 moritzm: draining primary/secondary instances off ganeti2011 [[phab:T296622|T296622]]
* 19:38 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.4
* 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2009.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 17:58 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.4 (duration: 42m 33s)
* 09:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2009.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 17:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead (duration: 01m 46s)
* 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
* 17:24 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead
* 09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
* 17:16 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18019 and previous config saved to /var/cache/conftool/dbconfig/20211203-091537-marostegui.json
* 17:03 brennen: 1.37.0-wmf.4 was branched at {{Gerrit|f069fd8b5a6c817f4860fa68ae2f56b71a139f4a}} for [[phab:T281145|T281145]]
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18018 and previous config saved to /var/cache/conftool/dbconfig/20211203-090033-marostegui.json
* 17:00 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org bullseye-wikimedia
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2009.codfw.wmnet with OS buster
* 16:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead (duration: 01m 54s)
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161', diff saved to https://phabricator.wikimedia.org/P18017 and previous config saved to /var/cache/conftool/dbconfig/20211203-084528-marostegui.json
* 16:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead
* 08:44 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:16 dzahn@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 08:43 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:15 dzahn@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1161 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18016 and previous config saved to /var/cache/conftool/dbconfig/20211203-083023-marostegui.json
* 16:13 mutante: k8s: upgrading release=namespaces, helmfile apply to create miscweb namespace [[phab:T281538|T281538]]
* 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS buster
* 16:13 dzahn@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18015 and previous config saved to /var/cache/conftool/dbconfig/20211203-082859-marostegui.json
* 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 16:12 dzahn@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18014 and previous config saved to /var/cache/conftool/dbconfig/20211203-082848-marostegui.json
* 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18013 and previous config saved to /var/cache/conftool/dbconfig/20211203-081343-marostegui.json
* 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110', diff saved to https://phabricator.wikimedia.org/P18012 and previous config saved to /var/cache/conftool/dbconfig/20211203-075839-marostegui.json
* 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1110 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18011 and previous config saved to /var/cache/conftool/dbconfig/20211203-074334-marostegui.json
* 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18010 and previous config saved to /var/cache/conftool/dbconfig/20211203-073910-marostegui.json
* 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:46 moritzm: installing exim security updates on buster
* 07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15721 and previous config saved to /var/cache/conftool/dbconfig/20210504-133950-root.json
* 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15720 and previous config saved to /var/cache/conftool/dbconfig/20210504-132446-root.json
* 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:14 moritzm: upgrading linux-libc-dev on buster hosts (to version introduced by 10.9 point release)
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18009 and previous config saved to /var/cache/conftool/dbconfig/20211203-073404-marostegui.json
* 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18008 and previous config saved to /var/cache/conftool/dbconfig/20211203-071900-marostegui.json
* 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P18007 and previous config saved to /var/cache/conftool/dbconfig/20211203-070355-marostegui.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15719 and previous config saved to /var/cache/conftool/dbconfig/20210504-130943-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18006 and previous config saved to /var/cache/conftool/dbconfig/20211203-064850-marostegui.json
* 13:01 moritzm: installing debian-archive-keyring updates on buster
* 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15718 and previous config saved to /var/cache/conftool/dbconfig/20210504-125439-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18005 and previous config saved to /var/cache/conftool/dbconfig/20211203-062019-marostegui.json
* 12:50 marostegui: Upgrade mysql and kernel on db1137 [[phab:T281212|T281212]]
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 to upgrade its mysql [[phab:T281212|T281212]]', diff saved to https://phabricator.wikimedia.org/P15717 and previous config saved to /var/cache/conftool/dbconfig/20210504-124937-marostegui.json
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15716 and previous config saved to /var/cache/conftool/dbconfig/20210504-124848-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18004 and previous config saved to /var/cache/conftool/dbconfig/20211203-062011-marostegui.json
* 12:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after sanitarium master switch [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15715 and previous config saved to /var/cache/conftool/dbconfig/20210504-124647-kormat.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18003 and previous config saved to /var/cache/conftool/dbconfig/20211203-060506-marostegui.json
* 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Depooling for sanitarium master switch [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15714 and previous config saved to /var/cache/conftool/dbconfig/20210504-123537-kormat.json
* 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 [[phab:T280751|T280751]]
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P18002 and previous config saved to /var/cache/conftool/dbconfig/20211203-055001-marostegui.json
* 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 [[phab:T280751|T280751]]
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18001 and previous config saved to /var/cache/conftool/dbconfig/20211203-053457-marostegui.json
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15713 and previous config saved to /var/cache/conftool/dbconfig/20210504-123344-root.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P18000 and previous config saved to /var/cache/conftool/dbconfig/20211203-053032-marostegui.json
* 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 05:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|683b876}}: {{Gerrit|5763630}}: GrowthExperiments: Rename control variant to control, GrowthExperiments: Set linkrecommendation variant to 0 ([[phab:T281727|T281727]]) (duration: 00m 58s)
* 01:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS buster
* 12:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|8f938c2}}: {{Gerrit|c8c07ab}}: GrowthExperiments backports ([[phab:T281727|T281727]]) (duration: 00m 59s)
* 01:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS buster
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15712 and previous config saved to /var/cache/conftool/dbconfig/20210504-121841-root.json
* 01:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS buster
* 12:08 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 01:01 tgr: UTC late deploys done
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15711 and previous config saved to /var/cache/conftool/dbconfig/20210504-120337-root.json
* 01:00 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:743177{{!}}Add an image: Add test version of GEInfoboxTemplates (T291232)]] (duration: 00m 57s)
* 11:58 marostegui: Upgrade mysql and kernel on db1120 [[phab:T281212|T281212]]
* 00:44 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/python3-imagecatalog/imagecatalog_0.0.1-1_amd64.changes
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 to upgrade its mysql [[phab:T281212|T281212]]', diff saved to https://phabricator.wikimedia.org/P15710 and previous config saved to /var/cache/conftool/dbconfig/20210504-115634-marostegui.json
* 00:37 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes: Backport: [[gerrit:743178{{!}}Avoid references to TemplateCollectionFeature]] step2 (duration: 00m 56s)
* 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 00:36 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Config/Validation/GrowthConfigValidation.php: Backport: [[gerrit:743178{{!}}Avoid references to TemplateCollectionFeature]] step 1 (duration: 00m 56s)
* 11:31 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] );` on arwiki, bnwiki, viwiki ([[phab:T278710|T278710]], [[phab:T281703|T281703]])
* 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS buster
* 11:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87dff0b1abe588f0ddc62985fdb40b5ec0fa1bbd}}: GrowthExperiments: Enable link recommendations for target wikis ([[phab:T278710|T278710]]) (duration: 00m 57s)
* 11:10 Urbanecm: Create growthexperiments_link_recommendations and growthexperiments_link_submissions on arwiki,bnwiki,viwiki x1 ([[phab:T266913|T266913]])
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8228f6beacd2f7e94a65f32d41f558c0f440db0a}}: Disable ContentTranslation New article campaign in fiwiki ([[phab:T277473|T277473]]) (duration: 00m 59s)
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15707 and previous config saved to /var/cache/conftool/dbconfig/20210504-102649-root.json
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15705 and previous config saved to /var/cache/conftool/dbconfig/20210504-101145-root.json
* 09:57 moritzm: installing bind9 security updates on buster (client side tools/libs only)
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15704 and previous config saved to /var/cache/conftool/dbconfig/20210504-095642-root.json
* 09:45 godog: +50G for prometheus k8s in codfw
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15703 and previous config saved to /var/cache/conftool/dbconfig/20210504-094138-root.json
* 09:04 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 09:04 moritzm: rolling restart of cassandra in codfw to pick up Java security updates
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15702 and previous config saved to /var/cache/conftool/dbconfig/20210504-081716-root.json
* 08:02 marostegui: Check tables on db1106, lag will show up on s1 on wiki replicas ([[phab:T280492|T280492]])
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15701 and previous config saved to /var/cache/conftool/dbconfig/20210504-080213-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15700 and previous config saved to /var/cache/conftool/dbconfig/20210504-080212-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15698 and previous config saved to /var/cache/conftool/dbconfig/20210504-074639-root.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15697 and previous config saved to /var/cache/conftool/dbconfig/20210504-074632-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15696 and previous config saved to /var/cache/conftool/dbconfig/20210504-073135-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15695 and previous config saved to /var/cache/conftool/dbconfig/20210504-073127-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15694 and previous config saved to /var/cache/conftool/dbconfig/20210504-071632-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15693 and previous config saved to /var/cache/conftool/dbconfig/20210504-071623-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15691 and previous config saved to /var/cache/conftool/dbconfig/20210504-065034-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15690 and previous config saved to /var/cache/conftool/dbconfig/20210504-063530-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15689 and previous config saved to /var/cache/conftool/dbconfig/20210504-062027-root.json
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15688 and previous config saved to /var/cache/conftool/dbconfig/20210504-061700-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15687 and previous config saved to /var/cache/conftool/dbconfig/20210504-060523-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15686 and previous config saved to /var/cache/conftool/dbconfig/20210504-060156-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 to clone db1178 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15684 and previous config saved to /var/cache/conftool/dbconfig/20210504-055116-marostegui.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15683 and previous config saved to /var/cache/conftool/dbconfig/20210504-055020-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15682 and previous config saved to /var/cache/conftool/dbconfig/20210504-054653-root.json
* 05:45 marostegui: Stop mysql on db1158 to clone db1178
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 to clone db1178 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15680 and previous config saved to /var/cache/conftool/dbconfig/20210504-054539-marostegui.json
* 05:36 marostegui: Deploy schema change on s6 codfw, lag will appear - [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15678 and previous config saved to /var/cache/conftool/dbconfig/20210504-053149-root.json
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15677 and previous config saved to /var/cache/conftool/dbconfig/20210504-052612-root.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15676 and previous config saved to /var/cache/conftool/dbconfig/20210504-051108-root.json
* 05:07 marostegui: Restart sanitarium hosts to pick up new filters [[phab:T263817|T263817]]
* 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15675 and previous config saved to /var/cache/conftool/dbconfig/20210504-045605-root.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15674 and previous config saved to /var/cache/conftool/dbconfig/20210504-044101-root.json
* 04:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:36 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]`
* 03:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
* 02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
* 01:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]


== 2021-05-03 ==
== 2021-12-02 ==
* 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|230ef5716b34ca83348667f289180313b76ce8a3}}: Prepare for new configuration option ([[phab:T277951|T277951]]) (duration: 00m 57s)
* 20:05 legoktm: re-pooling mw1414 following testing
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7c47ee17b3936fb1f79590187a9e0028276e4a9d}}: Replace $wgRelatedArticlesFooterWhitelistedSkins ([[phab:T277958|T277958]]) (duration: 00m 57s)
* 19:35 legoktm: installing yaml PHP extension on canaries
* 23:14 urbanecm@deploy1002: sync-file aborted: {{Gerrit|7c47ee17b3936fb1f79590187a9e0028276e4a9d}}: Replace $wgRelatedArticlesFooterWhitelistedSkins ([[phab:T277958|T277958]](duration: 00m 01s)
* 19:29 andrewbogott: upgrading wikitech-static deb packages as well as moving to mediawiki 1.37.0
* 22:17 legoktm: ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l
* 19:26 majavah: UTC evening deploys done
* 22:14 mutante: [backup1001:~] $ sudo check_bacula.py --icinga
* 19:26 taavi@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/webUIScroll.js: Backport: [[gerrit:743227{{!}}Update scroll instrument (T294246)]] (duration: 00m 56s)
* 21:56 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]`
* 19:22 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720363{{!}}Drop old config names for CentralAuth denylist controls (T277932)]] (duration: 00m 56s)
* 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 19:12 taavi@deploy1002: Synchronized wmf-config: Config: [[gerrit:739032{{!}}GrowthExperiments configuration fixes (T294737)]] (duration: 00m 57s)
* 21:54 ryankemper: [[phab:T280563|T280563]] eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))`
* 18:56 legoktm: upgraded scap to 4.1.0 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary ([[phab:T296867|T296867]])
* 21:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 18:45 legoktm: uploaded scap 4.1.0 to apt.wm.o ([[phab:T296867|T296867]])
* 21:47 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]`
* 18:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 21:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 18:19 vgutierrez: re-enable puppet on cp3064 - [[phab:T296874|T296874]]
* 21:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d95b91648}} (duration: 00m 58s)
* 18:14 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
* 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
* 17:51 vgutierrez: puppet disabled on cp3064 to manually increase number of maxconns in HAProxy - [[phab:T296874|T296874]]
* 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
* 17:38 ryankemper: [WDQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/743216/; as a result of the fix `'-Dwdqs.throttling-filter.time-bucket-capacity-in-seconds=240', '-Dwdqs.throttling-filter.time-bucket-refill-amount-in-seconds=120', '-Dwdqs.throttling-filter.ban-duration-in-minutes=60'` will now be in the `extra_jvm_opts` for `wdqs-internal` hosts
* 21:22 ryankemper: [WDQS] `ryankemper@wdqs1003:~$ sudo pool`
* 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2022.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 21:20 ryankemper: [[phab:T280382|T280382]] [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no`
* 15:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2022.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage
* 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17997 and previous config saved to /var/cache/conftool/dbconfig/20211202-145151-marostegui.json
* 21:09 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17996 and previous config saved to /var/cache/conftool/dbconfig/20211202-143646-marostegui.json
* 21:06 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17995 and previous config saved to /var/cache/conftool/dbconfig/20211202-142141-marostegui.json
* 21:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17994 and previous config saved to /var/cache/conftool/dbconfig/20211202-140636-marostegui.json
* 21:02 ryankemper: [[phab:T280382|T280382]] `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  975G  1.5T  39% /srv`
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17993 and previous config saved to /var/cache/conftool/dbconfig/20211202-140557-marostegui.json
* 20:56 ryankemper: [[phab:T280382|T280382]] [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force`
* 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 20:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 20:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17992 and previous config saved to /var/cache/conftool/dbconfig/20211202-140548-marostegui.json
* 20:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17990 and previous config saved to /var/cache/conftool/dbconfig/20211202-135043-marostegui.json
* 20:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:49 hnowlan: roll-restarting tilerator,tileratorui,kartotherian in eqiad
* 19:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 13:37 hnowlan: roll-restarting tilerator,tileratorui,kartotherian in codfw
* 19:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17989 and previous config saved to /var/cache/conftool/dbconfig/20211202-133538-marostegui.json
* 19:24 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17988 and previous config saved to /var/cache/conftool/dbconfig/20211202-132034-marostegui.json
* 19:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2128 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17987 and previous config saved to /var/cache/conftool/dbconfig/20211202-131959-marostegui.json
* 19:21 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
* 13:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2094,2128].codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 19:21 ryankemper: [[phab:T280382|T280382]] [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead)
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2094,2128].codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 18:20 Urbanecm: Morning B&C window done
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17986 and previous config saved to /var/cache/conftool/dbconfig/20211202-131949-marostegui.json
* 18:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: {{Gerrit|cf9d9da3bf272d33c2d9b29d9172b1c81bfd8beb}}: Hotfix: loadRelatedArticles should consider existence of container element ([[phab:T281547|T281547]]) (duration: 00m 57s)
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17985 and previous config saved to /var/cache/conftool/dbconfig/20211202-130444-marostegui.json
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/filebackend.php: {{Gerrit|bc1bc903169e4982c0c5a930094bed9f22616293}}: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads ([[phab:T281650|T281650]]; 2/2) (duration: 00m 57s)
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17983 and previous config saved to /var/cache/conftool/dbconfig/20211202-124940-marostegui.json
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|bc1bc903169e4982c0c5a930094bed9f22616293}}: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads ([[phab:T281650|T281650]]; 1/2) (duration: 00m 58s)
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17982 and previous config saved to /var/cache/conftool/dbconfig/20211202-123435-marostegui.json
* 17:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2113 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17981 and previous config saved to /var/cache/conftool/dbconfig/20211202-123356-marostegui.json
* 17:20 hashar: Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # [[phab:T281737|T281737]]
* 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 16:30 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 16:29 ryankemper: [[phab:T281498|T281498]] `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17980 and previous config saved to /var/cache/conftool/dbconfig/20211202-123348-marostegui.json
* 16:27 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet
* 12:31 moritzm: installing NSS security updates
* 16:19 legoktm: legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging
* 12:27 Lucas_WMDE: UTC morning backport+config window done
* 15:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:743116{{!}}Wikisource: enable proofreading change-tagging for all Wikisources (T289140)]] (duration: 00m 57s)
* 15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17979 and previous config saved to /var/cache/conftool/dbconfig/20211202-121843-marostegui.json
* 15:27 Amir1: upgrade group A to mailman3 ([[phab:T280322|T280322]])
* 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2009.codfw.wmnet with OS buster
* 14:27 volans: uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17978 and previous config saved to /var/cache/conftool/dbconfig/20211202-120338-marostegui.json
* 13:43 volans: uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2009.codfw.wmnet with OS buster
* 13:10 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user ([[phab:T281703|T281703]])
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17977 and previous config saved to /var/cache/conftool/dbconfig/20211202-114833-marostegui.json
* 12:36 kostajh: Backport window done
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17976 and previous config saved to /var/cache/conftool/dbconfig/20211202-114755-marostegui.json
* 12:33 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:684378{{!}}GrowthExperiments: Set default variant (T278123)]] [[gerrit:684331{{!}}GrowthExperiments: enable link recommendations frontend on cswiki (T278710)]] (duration: 00m 57s)
* 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:07 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:684327{{!}}GrowthExperiments: enable link recommendations backend on cswiki (T278710)]] (duration: 00m 57s)
* 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:56 kharlan@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:684080{{!}}refreshLinkRecommendations.php: Use per-wiki locks]] [[gerrit:684078{{!}}Handle DB readonly errors (T281382)]] (duration: 00m 58s)
* 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: {{Gerrit|a438b641c81fa16faba287407012beaff8b1f3ba}}: Fix settings dialog offering ReferencePreviews when unavailable ([[phab:T281352|T281352]]) (duration: 00m 58s)
* 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c5a7c67b4daf33e0f9aaabec3f35ab6d4184894b}}: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 11:47 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f1a5ef0116c77b86b1abfb7bfa7d4ed363c69f61}}: wikidata: post edit constraint jobs on 70% of edits ([[phab:T204031|T204031]]) (duration: 00m 57s)
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17975 and previous config saved to /var/cache/conftool/dbconfig/20211202-114711-marostegui.json
* 10:59 moritzm: installing avahi security updates on buster
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17974 and previous config saved to /var/cache/conftool/dbconfig/20211202-113206-marostegui.json
* 10:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:684302{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 11:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:684302{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 11:21 moritzm: draining primary/secondary instances off ganeti2022 [[phab:T296622|T296622]]
* 09:42 moritzm: installing python3.7 security updates
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17973 and previous config saved to /var/cache/conftool/dbconfig/20211202-111702-marostegui.json
* 09:41 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s)
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17972 and previous config saved to /var/cache/conftool/dbconfig/20211202-110157-marostegui.json
* 09:12 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a]
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2089:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17971 and previous config saved to /var/cache/conftool/dbconfig/20211202-110120-marostegui.json
* 09:10 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s)
* 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:10 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a]
* 11:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2089.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:09 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s)
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17970 and previous config saved to /var/cache/conftool/dbconfig/20211202-110110-marostegui.json
* 08:52 joal@deploy1002: Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a]
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17969 and previous config saved to /var/cache/conftool/dbconfig/20211202-104606-marostegui.json
* 08:01 moritzm: installing edk2 security updates
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17968 and previous config saved to /var/cache/conftool/dbconfig/20211202-103100-marostegui.json
* 07:31 moritzm: installing libimage-exiftool-perl security updates
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17967 and previous config saved to /var/cache/conftool/dbconfig/20211202-101555-marostegui.json
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2075 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17966 and previous config saved to /var/cache/conftool/dbconfig/20211202-101522-marostegui.json
* 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2075.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Maintenance [[phab:T277354|T277354]]
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Maintenance [[phab:T277354|T277354]]
* 10:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17964 and previous config saved to /var/cache/conftool/dbconfig/20211202-100307-marostegui.json
* 09:52 moritzm: draining primary/secondary instances off ganeti2009 [[phab:T296622|T296622]]
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17963 and previous config saved to /var/cache/conftool/dbconfig/20211202-094802-marostegui.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17962 and previous config saved to /var/cache/conftool/dbconfig/20211202-093257-marostegui.json
* 09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 09:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to ganeti01.svc.codfw.wmnet
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17961 and previous config saved to /var/cache/conftool/dbconfig/20211202-091753-marostegui.json
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17960 and previous config saved to /var/cache/conftool/dbconfig/20211202-091629-marostegui.json
* 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
* 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2010.codfw.wmnet with OS buster
* 08:29 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 4h)
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 02:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1028.eqiad.wmnet with OS buster
* 02:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 02:40 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster
* 02:15 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 02:14 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1028.eqiad.wmnet with OS buster
* 01:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS buster
* 01:21 ryankemper: [[phab:T280001|T280001]] Rolling restart of low-traffic pybal hosts complete. All of `wcqs` is pooled and the pybal / ipvs related alerts have cleared
* 01:16 ryankemper: [[phab:T280001|T280001]] Pooled `wcqs200[1-3]` (had been left unpooled from when we last removed wcqs from production)
* 01:12 ryankemper: [[phab:T280001|T280001]] Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2009*,lvs1015*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 01:11 ryankemper: [[phab:T280001|T280001]] Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
* 01:08 ryankemper: [[phab:T280001|T280001]] Sanity check of `sudo ipvsadm -L -n` on backup  `lvs2010` and `lvs1016` looks good (for ex `lvs1016` has `TCP  10.2.2.67:443 wrr`)
* 01:07 ryankemper: [[phab:T280001|T280001]] Restarting pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2010*,lvs1016*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 01:02 ryankemper: [[phab:T280001|T280001]] `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
* 01:01 ryankemper: [[phab:T280001|T280001]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/742841
* 01:00 ryankemper: [[phab:T280001|T280001]] About to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/742841 to bring `wcqs` into state `lvs_setup`, after which I'll perform a rolling restart of pybal
* 00:24 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/skins/Vector/: {{Gerrit|a7586cd4a2559248ea1fd29cf74de535de016501}}: Update scroll observer to allow event logging ([[phab:T292586|T292586]]) (duration: 00m 57s)


== 2021-05-02 ==
== 2021-12-01 ==
* 13:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 22:15 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
* 13:40 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 22:15 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:13 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
* 22:13 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 07s)
* 22:12 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 01m 23s)
* 22:11 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:10 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 22:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:10 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 22:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 22:09 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 22:09 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 21:12 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 21:12 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 21:11 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 16s)
* 21:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 21:10 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257]: (no justification provided)
* 21:09 razzi@deploy1002: Finished deploy [analytics/refinery@3b1b794]: Regular analytics weekly train [analytics/refinery@3b1b794] (duration: 21m 18s)
* 21:06 jynus: installing python-monotonic on ms-fe2011, ms-fe2012 (breaks swift-proxy)
* 21:02 jynus: installing python-monotonic on ms-fe2010
* 20:48 razzi@deploy1002: Started deploy [analytics/refinery@3b1b794]: Regular analytics weekly train [analytics/refinery@3b1b794]
* 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:46 otto@deploy1002: Finished deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 19:46 otto@deploy1002: Started deploy [airflow-dags/analytics@2f59257] (hadoop-test): (no justification provided)
* 19:30 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 22s)
* 19:30 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 19:27 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 02m 26s)
* 19:25 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 19:24 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 19:24 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 19:24 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 19:24 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 19:18 otto@deploy1002: Finished deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided) (duration: 00m 03s)
* 19:18 otto@deploy1002: Started deploy [airflow-dags/analytics@bea2abe] (hadoop-test): (no justification provided)
* 19:13 majavah: UTC evening deploys done
* 19:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742834{{!}}Add mediawiki.web_ui_scroll stream (T292586)]] (duration: 00m 57s)
* 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1089.eqiad.wmnet with OS buster
* 18:39 vgutierrez: pool cp1089 using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 17:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1089.eqiad.wmnet with OS buster
* 17:54 vgutierrez: depool cp1089 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 16:08 moritzm: installing postgresql-9.6 security updates
* 15:54 godog: bounce logstash on eqiad/codfw to apply template changes
* 15:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 15:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
* 15:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 15:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
* 15:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 15:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17955 and previous config saved to /var/cache/conftool/dbconfig/20211201-150853-marostegui.json
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17954 and previous config saved to /var/cache/conftool/dbconfig/20211201-145348-marostegui.json
* 14:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2010.codfw.wmnet with OS buster
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17953 and previous config saved to /var/cache/conftool/dbconfig/20211201-143843-marostegui.json
* 14:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetboard1001.eqiad.wmnet
* 14:29 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard1001.eqiad.wmnet
* 14:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetboard2001.codfw.wmnet
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17951 and previous config saved to /var/cache/conftool/dbconfig/20211201-142339-marostegui.json
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17950 and previous config saved to /var/cache/conftool/dbconfig/20211201-142227-marostegui.json
* 14:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 14:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17949 and previous config saved to /var/cache/conftool/dbconfig/20211201-142219-marostegui.json
* 14:13 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
* 14:13 jynus: started commonswiki codfw media backup at 8 threads of parallelism
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17948 and previous config saved to /var/cache/conftool/dbconfig/20211201-140715-marostegui.json
* 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS buster
* 13:56 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17947 and previous config saved to /var/cache/conftool/dbconfig/20211201-135210-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17946 and previous config saved to /var/cache/conftool/dbconfig/20211201-133705-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17945 and previous config saved to /var/cache/conftool/dbconfig/20211201-133554-marostegui.json
* 13:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17944 and previous config saved to /var/cache/conftool/dbconfig/20211201-133546-marostegui.json
* 13:30 moritzm: set "sudo gnt-cluster modify --hypervisor-parameters kvm:machine_version=pc-i440fx-2.8" for ganeti eqiad cluster [[phab:T294120|T294120]]
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17942 and previous config saved to /var/cache/conftool/dbconfig/20211201-132041-marostegui.json
* 13:19 vgutierrez: restore haproxy 2.2.9 on cp3064 - [[phab:T290005|T290005]]
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17939 and previous config saved to /var/cache/conftool/dbconfig/20211201-130536-marostegui.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17938 and previous config saved to /var/cache/conftool/dbconfig/20211201-125031-marostegui.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17937 and previous config saved to /var/cache/conftool/dbconfig/20211201-124919-marostegui.json
* 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17936 and previous config saved to /var/cache/conftool/dbconfig/20211201-122020-marostegui.json
* 12:11 urbanecm: EU B&C window done
* 12:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8ab29b2feb47d611873cf0465b2a2dd5eac0ad2}}: enwikisource: enable anonymous talk page mobile tabs ([[phab:T47955|T47955]]) (duration: 00m 56s)
* 12:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2bd14e8968c90b2562f045457d61b252728e6250}}: Add templateeditor group and protection level at viwiki ([[phab:T296154|T296154]]) (duration: 00m 56s)
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17935 and previous config saved to /var/cache/conftool/dbconfig/20211201-120515-marostegui.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17934 and previous config saved to /var/cache/conftool/dbconfig/20211201-115011-marostegui.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17933 and previous config saved to /var/cache/conftool/dbconfig/20211201-113506-marostegui.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17932 and previous config saved to /var/cache/conftool/dbconfig/20211201-113354-marostegui.json
* 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db[1155,1165].eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:31 vgutierrez: test HAProxy 2.4.9 on cp3064 - [[phab:T290005|T290005]]
* 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1140.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17931 and previous config saved to /var/cache/conftool/dbconfig/20211201-112952-marostegui.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17930 and previous config saved to /var/cache/conftool/dbconfig/20211201-111448-marostegui.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17929 and previous config saved to /var/cache/conftool/dbconfig/20211201-105943-marostegui.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17928 and previous config saved to /var/cache/conftool/dbconfig/20211201-104438-marostegui.json
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17927 and previous config saved to /var/cache/conftool/dbconfig/20211201-104316-marostegui.json
* 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17926 and previous config saved to /var/cache/conftool/dbconfig/20211201-104308-marostegui.json
* 10:29 Lucas_WMDE: Deployed patch for [[phab:T296578|T296578]]
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17925 and previous config saved to /var/cache/conftool/dbconfig/20211201-102804-marostegui.json
* 10:23 vgutierrez: test haproxy_2.2.19-1~bpo10+1 on cp3064 - [[phab:T290005|T290005]]
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17924 and previous config saved to /var/cache/conftool/dbconfig/20211201-101259-marostegui.json
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17923 and previous config saved to /var/cache/conftool/dbconfig/20211201-095754-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17922 and previous config saved to /var/cache/conftool/dbconfig/20211201-095632-marostegui.json
* 09:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1098.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17921 and previous config saved to /var/cache/conftool/dbconfig/20211201-095624-marostegui.json
* 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:46 taavi@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:742925{{!}}beta: Update mx host]] (duration: 00m 55s)
* 09:43 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwiki extensions/CheckUser/maintenance/fixTrailingSpacesInLogs.php
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17920 and previous config saved to /var/cache/conftool/dbconfig/20211201-094120-marostegui.json
* 09:39 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevision.php: Backport: [[gerrit:742853{{!}}Drop using ft_title and ft_namespace (T296380)]] (duration: 00m 56s)
* 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17919 and previous config saved to /var/cache/conftool/dbconfig/20211201-092615-marostegui.json
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2005.codfw.wmnet with reason: Switch to DRBD for migration
* 09:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2005.codfw.wmnet with reason: Switch to DRBD for migration
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17918 and previous config saved to /var/cache/conftool/dbconfig/20211201-091110-marostegui.json
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17917 and previous config saved to /var/cache/conftool/dbconfig/20211201-090948-marostegui.json
* 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 09:03 vgutierrez: rolling restart of haproxy and varnish on O:cache::text_haproxy and O:cache::upload_haproxy - [[phab:T290005|T290005]]
* 08:56 moritzm: draining primary/secondary instance off ganeti2010 [[phab:T296622|T296622]]
* 08:51 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:41 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:32 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2117.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2117.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:32 catrope@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/NewcomerTasks/NewcomerTasksUserOptionsLookup.php: Backport: [[gerrit:742548{{!}}Newcomer tasks: Fix filtering of non-existent task types (T296366)]] (duration: 00m 56s)
* 00:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742817{{!}}Enable A/B test enrollment instrumentation. (T292587)]] (duration: 00m 56s)
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2021-05-01 ==
== 2021-11-30 ==
* 19:12 Urbanecm: Invalidate password for MaraBot@SUL ([[phab:T281586|T281586]])
* 23:59 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 16:58 legoktm@deploy1002: Synchronized logos/config.yaml: Add eswiki 20th anniversary logos (duration: 00m 57s)
* 23:57 mutante: deploy1002 - kube_env miscweb staging ; helmfile -e staging destroy
* 16:56 legoktm@deploy1002: Synchronized wmf-config/logos.php: Use eswiki 20th anniversary logos ([[phab:T280908|T280908]]) (duration: 00m 56s)
* 23:56 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:09 mutante: gerrit - added Majavah to wmf-deployment group for [[phab:T296777|T296777]]
* 22:30 krinkle@deploy1002: Finished deploy [integration/docroot@2af7007]: {{Gerrit|Ia89b6591639e5}} (duration: 00m 09s)
* 22:30 krinkle@deploy1002: Started deploy [integration/docroot@2af7007]: {{Gerrit|Ia89b6591639e5}}
* 22:21 mutante: welcome Majavah to MediaWiki deployers ([[phab:T296777|T296777]])
* 20:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5443b78f197b782238632966891d721859733a74}}: uzwiki: Deploy Growth features to newcomers ([[phab:T294245|T294245]]) (duration: 00m 57s)
* 18:09 legoktm: uploaded php-yaml for component/php72 ([[phab:T296331|T296331]])
* 18:08 vgutierrez: restart haproxy on cp3064 - [[phab:T290005|T290005]]
* 17:44 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17912 and previous config saved to /var/cache/conftool/dbconfig/20211130-174434-jynus.json
* 17:39 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 50%', diff saved to https://phabricator.wikimedia.org/P17911 and previous config saved to /var/cache/conftool/dbconfig/20211130-173935-jynus.json
* 17:35 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 25%', diff saved to https://phabricator.wikimedia.org/P17910 and previous config saved to /var/cache/conftool/dbconfig/20211130-173517-jynus.json
* 17:34 moritzm: installing libvorbis security updates
* 17:15 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1163 at 5%', diff saved to https://phabricator.wikimedia.org/P17908 and previous config saved to /var/cache/conftool/dbconfig/20211130-171550-jynus.json
* 17:00 jynus: move db1139:s1 under db1118
* 16:57 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17907 and previous config saved to /var/cache/conftool/dbconfig/20211130-165718-jynus.json
* 16:29 XioNoX: Move cr2-codfw lumen transit link to BO cable - [[phab:T289241|T289241]]
* 16:26 XioNoX: Move cr2-codfw eqord link to BO cable - [[phab:T289241|T289241]]
* 16:23 XioNoX: Move cr2-codfw pfw3 link to BO cable - [[phab:T289241|T289241]]
* 16:20 Emperor: reboot ms-be2059 to fix device enumeration order re [[phab:T295563|T295563]]
* 16:14 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 at 25%', diff saved to https://phabricator.wikimedia.org/P17906 and previous config saved to /var/cache/conftool/dbconfig/20211130-161457-jynus.json
* 16:13 XioNoX: cr2-codfw bounce fpc 1 pic 0 (vrrp backup) - [[phab:T289241|T289241]]
* 16:07 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1163 at 50%', diff saved to https://phabricator.wikimedia.org/P17905 and previous config saved to /var/cache/conftool/dbconfig/20211130-160748-jynus.json
* 16:06 bblack: lvs2007 - repooling into service
* 16:01 bblack: lvs2007 - depooling for network maint - do not push LVS config changes please!
* 15:41 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard2001.codfw.wmnet
* 15:41 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
* 15:38 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard2001.codfw.wmnet
* 15:37 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2001.codfw.wmnet
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:12 jforrester@deploy1002: Synchronized multiversion/MWMultiVersion.php: Add wikifunctions hard-coded value to setSiteInfoForWiki for Beta Cluster [[phab:T284162|T284162]] (duration: 00m 56s)
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:45 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:25 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17904 and previous config saved to /var/cache/conftool/dbconfig/20211130-131124-marostegui.json
* 13:05 topranks: Running homer against CR routers to adjust loopback4 filter enabling local NTP queries for status.  [[phab:T296623|T296623]]
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17903 and previous config saved to /var/cache/conftool/dbconfig/20211130-125620-marostegui.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17902 and previous config saved to /var/cache/conftool/dbconfig/20211130-124115-marostegui.json
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'After maintenance db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17901 and previous config saved to /var/cache/conftool/dbconfig/20211130-122610-marostegui.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2114 ([[phab:T277354|T277354]])', diff saved to https://phabricator.wikimedia.org/P17900 and previous config saved to /var/cache/conftool/dbconfig/20211130-122555-marostegui.json
* 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance [[phab:T277354|T277354]]
* 12:09 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts puppetboard1001.eqiad.wmnet
* 12:02 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard1001.eqiad.wmnet
* 11:50 moritzm: running "sudo gnt-cluster renew-crypto --new-node-certificates --new-rapi-certificate --new-spice-certificate" for Ganeti codfw cluster [[phab:T296622|T296622]]
* 11:01 hnowlan: restarting tilerator, kartotherian and tileratorui for updates in eqiad
* 11:01 hnowlan: restarting tilerator, kartotherian and tileratorui in codfw
* 10:39 elukey: rollout wmf-certificates 0~20211129-1 fleet wide (add group/others permissions to the cert bundle)
* 10:30 lucaswerkmeister-wmde@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:29 lucaswerkmeister-wmde@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:58 moritzm: installing remaining ICU security updates
* 09:06 Amir1: dropping wikiadmin@localhost from all pooled replicas of s6 ([[phab:T296511|T296511]])
* 08:24 dcausse: restarting blazegraph on wdqs1006 (jvm stuck for 6hours)
* 08:14 Amir1: revoking DROP from wikiadmin on all pooled replicas ([[phab:T249683|T249683]])
* 03:46 ejegg: updated payments-wiki from {{Gerrit|dbc92132}} to {{Gerrit|4a4ef51d}}
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:17 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742524{{!}}Enable scroll tracking for all users (T292586)]] (duration: 00m 55s)
* 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:14 catrope@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/readingDepth.js: Backport: [[gerrit:742517{{!}}Provide fallback for config variable when not present]] (duration: 00m 55s)
* 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:13 catrope@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:738530{{!}}allow sysops to set/remove reviewer group on ckbwiki (T294696)]] (duration: 00m 55s)


== 2021-04-30 ==
== 2021-11-29 ==
* 21:54 mutante: people1003 - rsycncing /home from peopel1002
* 22:32 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/EntitySchema/src/MediaWiki/Specials/SetEntitySchemaLabelDescriptionAliases.php: Deploy security patch for [[phab:T296578|T296578]] (duration: 00m 55s)
* 15:30 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 22:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:29 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 22:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:25 bstorm: hard rebooting cloudmetrics1002 [[phab:T275605|T275605]]
* 22:20 sbassett@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/FileImporter/src/Remote/MediaWiki/HttpApiLookup.php: Backport: [[gerrit:742263{{!}}SECURITY: Fix special page displaying unescaped user input (T296605)]] (duration: 00m 56s)
* 11:40 ladsgroup@deploy1002: Synchronized static/favicon/wikitech.ico: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 56s)
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:36 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 56s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:34 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 20:46 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Fix wgWikiLambdaOrchestratorLocation service pointer typo (duration: 00m 55s)
* 11:33 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 20:27 tgr: UTC evening deploys done
* 11:31 ladsgroup@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 20:26 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:742261{{!}}GrowthExperiments: Start imagerecommendation variant experiment]] (duration: 00m 55s)
* 09:04 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
* 20:23 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/AddImageSubmissionHandler.php: Backport: [[gerrit:742262{{!}}AddImage: Refresh user's task feed after undecided rejection (T296491)]] (duration: 00m 56s)
* 09:03 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
* 20:21 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:742260{{!}}SuggestedEdits: Drop isActivated() check in getJsData (T296626)]] (duration: 00m 56s)
* 08:11 moritzm: remove mc1027 from debmonitor, server is broken and won't return ([[phab:T276415|T276415]])
* 20:17 ejegg: updated payments-wiki from {{Gerrit|d1d6f024}} -> {{Gerrit|dbc92132}}
* 07:38 moritzm: installing iputils updates from Buster point release
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15667 and previous config saved to /var/cache/conftool/dbconfig/20210430-061549-root.json
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15666 and previous config saved to /var/cache/conftool/dbconfig/20210430-060046-root.json
* 20:10 eileen: civicrm
* 05:51 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15665 and previous config saved to /var/cache/conftool/dbconfig/20210430-054542-root.json
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15664 and previous config saved to /var/cache/conftool/dbconfig/20210430-053038-root.json
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:16 marostegui: Upgrade kernel on db1114
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15663 and previous config saved to /var/cache/conftool/dbconfig/20210430-051558-marostegui.json
* 20:00 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T295705|T295705]] Move CirrusSearch traffic back to eqiad (duration: 00m 56s)
* 05:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1080.eqiad.wmnet
* 19:42 legoktm: uploaded php-yaml_2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1_amd64.changes to apt.wm.o ([[phab:T296331|T296331]])
* 04:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1080.eqiad.wmnet
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo systemctl restart wdqs-blazegraph`
* 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:43 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 19:16 vgutierrez: pool cp3064 - [[phab:T290005|T290005]]
* 04:43 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 18:55 bblack: repooling esams
* 04:42 ryankemper: [[phab:T261239|T261239]] `elastic2033`, which is known to be in a state of hardware failure (we have a ticket open), is holding up the reboot of codfw. I don't think we have a good way to exclude a node currently. Going to just proceed to `eqiad` for now
* 18:48 bblack: esams: shifting depool method to esams-offline (now that its config is fixed)
* 04:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 18:42 legoktm: depooling esams
* 04:39 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 18:17 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 04:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:742259{{!}}rdbms: Add DB host to TransactionProfiler logging and fix time fields (T295706)]] (duration: 00m 56s)
* 04:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:03 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 17:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:50 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1010.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:47 ryankemper: [[phab:T280563|T280563]] about half of codfw nodes have been rebooted before the failure caused by write queue not emptying fast enough, kicking it off again:`sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:45 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:08 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 17:40 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Initial Beta Cluster deployment of Wikifunctions: III - CS for [[phab:T289315|T289315]] (duration: 00m 55s)
* 17:38 vgutierrez: pool cp3064 - [[phab:T290005|T290005]]
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 17:25 lucaswerkmeister-wmde@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:22 jforrester@deploy1002: Synchronized wmf-config/ProductionServices.php: Initial Beta Cluster deployment of Wikifunctions: II - Services for [[phab:T289315|T289315]] (duration: 00m 55s)
* 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Initial Beta Cluster deployment of Wikifunctions: I - IS for [[phab:T289315|T289315]] (duration: 00m 55s)
* 17:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06d8d25f6e89be0b1692d017bdbc2c9524372c0b}}: foundationwiki: Remove explicit wmgUseOAuth (duration: 00m 57s)
* 16:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|bad34ed8d86b30eb4c240da0498ddfb44af30ea7}}: Make foundationwiki a standard CentralAuth wiki ([[phab:T205347|T205347]]) (duration: 00m 56s)
* 16:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|567f2a9d4883c9a98a3251f153ea0ad58d7774c6}}: Revert "foundationwiki: Set wmgLocalAuthLoginOnly=false temporarily" ([[phab:T205347|T205347]]) (duration: 00m 56s)
* 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2069.codfw.wmnet with OS buster
* 16:04 moritzm: sudo gnt-cluster upgrade --to 2.16 for Ganeti codfw cluster
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:52 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 15:51 James_F: Running mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=enwiki en wikimedia wikifunctionswiki wikifunctions.beta.wmflabs.org in Beta Cluster for [[phab:T284162|T284162]]
* 15:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2069.codfw.wmnet with OS buster
* 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:47 papaul: power down logstash2028 for IDRAC reset
* 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:15 moritzm: gnt-cluster renew-crypto --new-cluster-certificate for codfw Ganeti cluster [[phab:T296622|T296622]]
* 14:40 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:38 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:37 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:55 vgutierrez: repool cp3064 - [[phab:T290005|T290005]]
* 12:51 moritzm: upgrading ganeti codfw cluster to 2.16 backport [[phab:T296622|T296622]]
* 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:32 vgutierrez: depool cp3064 - [[phab:T290005|T290005]]
* 12:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: {{Gerrit|05704407395fbf227eec47cf716393dc60a36a35}}: Fix error handling in SuggestedEdits::getActionData ([[phab:T296366|T296366]]) (duration: 05m 37s)
* 12:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7fdea3e71e4fd9e85c30efbc17f94c0711deb252}}:  Add planet4589.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T296136|T296136]]) (duration: 00m 56s)
* 12:11 vgutierrez: pool cp3064 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3064.esams.wmnet with OS buster
* 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:07 urbanecm@deploy1002: Synchronized docroot/: {{Gerrit|4662224229cb4083b8b01de436ccd65e8c00e7dd}}: Remove search.wikimedia.org files ([[phab:T289224|T289224]]) (duration: 00m 56s)
* 11:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|5fc6aaa73202a1bf2aa58998d2671d5f4a6255bc}}: Fix "Mark entries as bot entries" feature(2/2; [[phab:T296297|T296297]]) (duration: 00m 55s)
* 10:57 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/CentralAuth/includes/Special/SpecialMultiLock.php: {{Gerrit|5fc6aaa73202a1bf2aa58998d2671d5f4a6255bc}}: Fix "Mark entries as bot entries" feature (1/2; [[phab:T296297|T296297]]) (duration: 00m 56s)
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d01652ec22f6cb3413b419a3c9b0a7a08d79960f}}: Disable Growth IP research survey ([[phab:T294568|T294568]]) (duration: 00m 56s)
* 10:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
* 10:45 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3064.esams.wmnet with OS buster
* 10:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS buster
* 10:01 vgutierrez: depool cp3064 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 09:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2041.codfw.wmnet with OS buster
* 09:52 vgutierrez: pool cp2041 with HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 09:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:34 moritzm: rolling restart of mediawiki canaries to pick up ICU security updates
* 09:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|3a892860b2e1e2ac7b60fc1c4dbdb2035d6af950}}: foundationwiki: Do not enable wmgUsePageViewInfo explicitly (duration: 00m 55s)
* 09:32 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=foundationwiki 'inactive' # removing nonexistent group; backup left at P17888
* 09:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|786313c06188d5d63700d7e46384ef99a9297b57}}: foundationwiki: Clear group add/remove declarations (duration: 00m 55s)
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c3f47dc55b67d2b53ec27bb610978ff8165aa6ca}}: foundationwiki: Disable hard redirects (duration: 00m 57s)
* 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2041.codfw.wmnet with OS buster
* 08:56 vgutierrez: depool cp2041 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 08:54 moritzm: installing ICU security updates on buster
* 08:33 moritzm: installing bluez security updates
* 08:26 moritzm: installing libvpx security updates
* 08:19 moritzm: instaling libntlm security updates
* 08:07 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - [[phab:T296563|T296563]] (duration: 07m 01s)
* 08:00 marostegui: Restart db2078 and db1117
* 08:00 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Upgrade of mwparserfromhell - [[phab:T296563|T296563]]
* 07:31 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] - (second attempt, no git update submodules the first time) (duration: 00m 04s)
* 07:31 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] - (second attempt, no git update submodules the first time)
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2014.codfw.wmnet with OS bullseye
* 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2014.codfw.wmnet with OS bullseye


== 2021-04-29 ==
== 2021-11-28 ==
* 23:36 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:683749{{!}}Revert "DEMO: Add newline to README"]] (duration: 00m 56s)
* 17:14 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] (duration: 02m 11s)
* 23:18 ryankemper: [[phab:T280563|T280563]] successful reboot of `relforge100[3,4]`; `relforge` cluster is back to green status.
* 17:12 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]]
* 23:16 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:683747{{!}}DEMO: Add newline to README]] (duration: 00m 56s)
* 23:08 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts` (amended command)
* 23:06 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 23:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:46 ryankemper: [[phab:T280563|T280563]] Current master is `relforge1003-relforge-eqiad`, will reboot `1004` first then `1003` after
* 22:44 ryankemper: [[phab:T280563|T280563]] Bleh, we never moved the new config into spicerack, so it's trying to talk to the old relforge hosts which no longer exist. Will reboot relforge manually and use the cookbook for codfw/eqiad, and circle back later for the spicerack change
* 22:37 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:32 ryankemper: [[phab:T280563|T280563]] Spotted the issue; forgot to set `--without-lvs` for relforge reboot
* 22:27 ryankemper: [[phab:T280563|T280563]] `urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fbe4bb8a518>: Failed to establish a new connection: [Errno -2] Name or service not known`
* 22:26 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:26 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 21:36 mutante: icinga - enabling disabled notifications for random an-worker nodes where mgmt interface had enabled alerts but the actual host didnt
* 21:32 mutante: icinga - enabled notifications for checks on ms-backup1001 - they were all manually disabled but none of the checks had any status change since 50 days which indicates it was forgotten to turn them back on which is a common issue with disabling notifications
* 21:16 mutante: backup1001 - sudo check_bacula.py --icinga
* 20:54 marostegui: Stop mysql on tendril for the UTC night, dbtree and tendrill will remain down for a few hours [[phab:T281486|T281486]]
* 20:16 marostegui: Restart tendril database - [[phab:T281486|T281486]]
* 20:00 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 19:46 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]] (duration: 01m 08s)
* 19:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 19:32 dpifke@deploy1002: Finished deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484 (duration: 00m 05s)
* 19:32 dpifke@deploy1002: Started deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484
* 19:01 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki/wanobjectcache/revision_row_1/ (bad data from Sep 2019)
* 18:59 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/rl-minify-* (bad data from Aug 2018)
* 18:58 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki_ExternalGuidance_init_Google_tr_fr (bad data from Nov 2019)
* 18:38 krinkle@deploy1002: Synchronized php-1.37.0-wmf.1/includes/libs/objectcache/MemcachedBagOStuff.php: {{Gerrit|I926797a9d494a31}}, [[phab:T281480|T281480]] (duration: 01m 08s)
* 18:33 mutante: LDAP - added mmandere to wmf group ([[phab:T281344|T281344]])
* 18:10 krinkle@deploy1002: Synchronized php-1.37.0-wmf.3/includes/libs/objectcache/MemcachedBagOStuff.php: {{Gerrit|I926797a9d494a31}}, [[phab:T281480|T281480]] (duration: 01m 09s)
* 17:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:29 ryankemper: [[phab:T281498|T281498]] `sudo -E cumin 'C:role::lvs::balancer' 'sudo run-puppet-agent'`
* 16:28 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
* 16:27 liw@deploy1002: sync-wikiversions aborted: Revert "group[0{{!}}1] wikis to [VERSION]" (duration: 00m 01s)
* 16:22 ryankemper: [[phab:T281498|T281498]] `ryankemper@wdqs2004:~$ sudo depool`
* 16:20 ryankemper: [[phab:T281498|T281498]] `ryankemper@wdqs2004:~$ sudo run-puppet-agent`
* 16:18 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]] (duration: 02m 39s)
* 16:15 otto@deploy1002: Started deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]]
* 16:12 papaul: powerdown thanos-fe2001 for memory swap
* 15:44 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here)
* 15:43 ryankemper: [WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up
* 15:37 ryankemper: [WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001`
* 15:35 ryankemper: [WDQS] ^ scratch that, depooled `wdqs2001`
* 15:34 ryankemper: [WDQS] pooled `wdqs2001`
* 14:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
* 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
* 13:44 moritzm: installing Java security updates on stat* hosts
* 13:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
* 13:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
* 13:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
* 13:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
* 13:40 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]] (duration: 02m 59s)
* 13:37 otto@deploy1002: Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]]
* 13:11 moritzm: installing postgresql-11 security updates
* 13:08 jbond42: merge netbase change to manage /etc/services
* 13:07 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
* 13:06 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
* 12:36 Amir1: upgrading Quiddity to admin in mailman3
* 12:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
* 12:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
* 12:26 moritzm: installing grub2 updates from buster point release
* 12:06 jbond42: update debmonitor.discover.wmnet ssl cert
* 11:59 ladsgroup@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:683454{{!}}Undeploy JADE from production, Part III (T281418)]] (duration: 01m 07s)
* 11:54 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683453{{!}}Undeploy JADE from production, Part II (T281418)]], Part I (duration: 01m 06s)
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683452{{!}}Undeploy JADE from production, Part I (T281418)]] (duration: 01m 07s)
* 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 11:38 mbsantos@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683548{{!}}Enable suggested values in TemplateData and VisualEditor CommonSettings (T273857)]] (duration: 01m 07s)
* 11:34 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683534{{!}}Another fix for token cookie handling (T281346)]] (duration: 01m 07s)
* 11:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683533{{!}}Another fix for token cookie handling (T281346)]] (duration: 01m 08s)
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15658 and previous config saved to /var/cache/conftool/dbconfig/20210429-113211-root.json
* 11:24 mbsantos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683547{{!}}Enable suggested values in TemplateData and VisualEditor InitialiseSettings (T273857)]] (duration: 01m 07s)
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15657 and previous config saved to /var/cache/conftool/dbconfig/20210429-111708-root.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15656 and previous config saved to /var/cache/conftool/dbconfig/20210429-110204-root.json
* 10:59 moritzm: updating apt on buster (SUA 198), which eases bullseye upgrades [[phab:T275873|T275873]]
* 10:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683135{{!}}Fix CX token cookie (T281346)]] (duration: 01m 08s)
* 10:54 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683134{{!}}Fix CX token cookie (T281346)]] (duration: 01m 09s)
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15655 and previous config saved to /var/cache/conftool/dbconfig/20210429-104700-root.json
* 10:27 marostegui: Upgrade kernel on db1110
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15654 and previous config saved to /var/cache/conftool/dbconfig/20210429-102447-marostegui.json
* 09:42 volans: uploaded pynetbox 5.3.0-2 to bullseye-wikimedia on qpt.w.o
* 09:39 volans@deploy1002: Finished deploy [homer/deploy@e394769]: Release v0.2.8 (duration: 03m 30s)
* 09:35 volans@deploy1002: Started deploy [homer/deploy@e394769]: Release v0.2.8
* 09:01 jynus: stop replication and checking data of db2100:s7
* 08:57 marostegui: Upgrade kernel on db2133
* 08:51 marostegui: Upgrade kernel on db2125
* 08:50 marostegui: Upgrade kernel on db2124
* 08:46 marostegui: Upgrade kernel on db2122
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 100%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15652 and previous config saved to /var/cache/conftool/dbconfig/20210429-084011-root.json
* 08:39 marostegui: Upgrade kernel on db2121
* 08:33 marostegui: Upgrade kernel on db2120
* 08:28 volans@deploy1002: Finished deploy [homer/deploy@89cd07c]: Release v0.2.7 (duration: 03m 08s)
* 08:27 marostegui: Upgrade kernel on db2115
* 08:25 volans@deploy1002: Started deploy [homer/deploy@89cd07c]: Release v0.2.7
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 80%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15651 and previous config saved to /var/cache/conftool/dbconfig/20210429-082507-root.json
* 08:19 marostegui: Upgrade kernel on db2114
* 08:12 marostegui: Upgrade kernel on db2109
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 70%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15649 and previous config saved to /var/cache/conftool/dbconfig/20210429-081004-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 60%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15648 and previous config saved to /var/cache/conftool/dbconfig/20210429-075500-root.json
* 07:54 marostegui: Upgrade kernel on db2089
* 07:48 jynus: rolling restart of bacula hosts [[phab:T273182|T273182]]
* 07:48 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 01m 07s)
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15647 and previous config saved to /var/cache/conftool/dbconfig/20210429-074625-root.json
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 50%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15646 and previous config saved to /var/cache/conftool/dbconfig/20210429-073956-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 90%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15645 and previous config saved to /var/cache/conftool/dbconfig/20210429-073122-root.json
* 07:28 marostegui: Stop mysql and upgrade kernel on pc1007
* 07:28 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 01m 08s)
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 40%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15644 and previous config saved to /var/cache/conftool/dbconfig/20210429-072453-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 80%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15643 and previous config saved to /var/cache/conftool/dbconfig/20210429-071618-root.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 25%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15642 and previous config saved to /var/cache/conftool/dbconfig/20210429-070949-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15641 and previous config saved to /var/cache/conftool/dbconfig/20210429-070114-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 10%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15640 and previous config saved to /var/cache/conftool/dbconfig/20210429-065445-root.json
* 06:53 godog: add 100G to prometheus/ops in eqiad
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 60%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15639 and previous config saved to /var/cache/conftool/dbconfig/20210429-064611-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15637 and previous config saved to /var/cache/conftool/dbconfig/20210429-063107-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 40%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15636 and previous config saved to /var/cache/conftool/dbconfig/20210429-061603-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 30%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15635 and previous config saved to /var/cache/conftool/dbconfig/20210429-060100-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15634 and previous config saved to /var/cache/conftool/dbconfig/20210429-054556-root.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 20%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15633 and previous config saved to /var/cache/conftool/dbconfig/20210429-053052-root.json
* 05:22 marostegui: Check tables on db1121 (this will cause lag on s4 commonswiki, on wikireplicas)
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for tables checking', diff saved to https://phabricator.wikimedia.org/P15632 and previous config saved to /var/cache/conftool/dbconfig/20210429-052146-marostegui.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 15%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15631 and previous config saved to /var/cache/conftool/dbconfig/20210429-051549-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15630 and previous config saved to /var/cache/conftool/dbconfig/20210429-050045-root.json
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15629 and previous config saved to /var/cache/conftool/dbconfig/20210429-045557-marostegui.json
* 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15627 and previous config saved to /var/cache/conftool/dbconfig/20210429-045015-marostegui.json
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15626 and previous config saved to /var/cache/conftool/dbconfig/20210429-044458-marostegui.json
* 04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
* 04:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15625 and previous config saved to /var/cache/conftool/dbconfig/20210429-043857-marostegui.json
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1156 to dbctl [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15624 and previous config saved to /var/cache/conftool/dbconfig/20210429-043812-marostegui.json
* 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for reimage', diff saved to https://phabricator.wikimedia.org/P15623 and previous config saved to /var/cache/conftool/dbconfig/20210429-042757-marostegui.json
* 02:59 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job (duration: 00m 06s)
* 02:59 milimetric@deploy1002: Started deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job
* 02:58 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b]: Hotfix for referrer job (duration: 14m 40s)
* 02:44 milimetric@deploy1002: Started deploy [analytics/refinery@740226b]: Hotfix for referrer job
* 01:44 krinkle@deploy1002: Synchronized wmf-config/mc.php: {{Gerrit|I5869b3c3ba4a}} (duration: 01m 08s)
* 01:23 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 01:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:19 ryankemper: [[phab:T280382|T280382]] Aborted data transfer; `wdqs2007` is hosed (see https://phabricator.wikimedia.org/T281437)
* 01:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 00:40 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/specials/pagers/ImageListPager.php: [[phab:T281405|T281405]] (duration: 01m 08s)
* 00:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 00:06 ryankemper: [[phab:T280382|T280382]] `wdqs1013.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`


== 2021-04-28 ==
== 2021-11-27 ==
* 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:55 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]] (duration: 04m 14s)
* 23:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 19:51 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]]
* 23:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 19:47 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev (duration: 02m 01s)
* 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 19:45 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev
* 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 12:22 elukey: drop /var/tmp/core files from ores100[2,4] root partition full
* 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 12:10 elukey: drop /var/tmp/core files from ores1009, root partition full
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 11:55 elukey: disable coredumps for ORES celery units (will cause a roll restart of all celeries) - [[phab:T296563|T296563]]
* 23:06 dpifke@deploy1002: Finished deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 (duration: 00m 05s)
* 11:46 elukey: drop ores coredumps from ores1008
* 23:06 dpifke@deploy1002: Started deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886
* 09:56 elukey: powercycle analytics1071, soft lockup stacktraces in the tty
* 22:44 dwisehaupt: civiproxy revision changed to {{Gerrit|99cecb924a}} - initial rollout of code for testing
* 09:51 elukey: move ores coredump files from /var/cache/tmp to /srv/coredumps on ores100[6,7,8] and ores2003 to free space on the root partition
* 22:26 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 22:26 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:18 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 22:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 22:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:44 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 21:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 21:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:39 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 21:38 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 21:37 ryankemper: [[phab:T280382|T280382]] `wdqs2007` is reachable again; glancing at `/srv/wdqs` its `wikidata.jnl` is `839G` when it should be `975G` so I'll re-do the wikidata journal transfer
* 21:32 ryankemper: [[phab:T280382|T280382]] [WDQS] `wdqs2007` ssh is unreachable; power cycling via `racadm>>racadm serveraction powercycle`
* 21:24 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (previous reimage timed out, instance appears to have rebooted)
* 21:07 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 21:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 21:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 20:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:57 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
* 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:13 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]] (duration: 01m 07s)
* 19:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 18:21 legoktm: added mvolz as listadmin for services@ and reset admin pw ([[phab:T278516|T278516]])
* 17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Wikibase/client/includes/DataAccess/Scribunto/WikibaseLanguageIndependentLuaBindings.php: {{Gerrit|b392dba0d77904d7de819043e51d8c3fbf003873}}: Fix incorrect ItemId typehint in Lua bindings ([[phab:T281361|T281361]]) (duration: 01m 09s)
* 16:52 papaul: powerdown logstash2034 for relocation
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 16:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 16:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 16:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 16:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 16:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 16:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:20 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts conf[2001-2003].codfw.wmnet
* 15:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 15:03 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:00 moritzm: imported python-poolcounter 0.0.2-1+deb11u1 to apt.wikimedia.org [[phab:T275873|T275873]]
* 14:53 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts conf[2001-2003].codfw.wmnet
* 14:44 moritzm: imported gitlab-ce 13.9.7-ce.0 to apt.wikimedia.org
* 14:40 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d] (duration: 04m 59s)
* 14:35 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d]
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d] (duration: 00m 06s)
* 14:34 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d]
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 03m 07s)
* 14:32 moritzm: installing iproute2 updates from buster point release
* 14:31 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 14:30 milimetric@deploy1002: deploy aborted: - (duration: 00m 00s)
* 14:30 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: -
* 14:30 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 12m 31s)
* 14:26 moritzm: installing net-snmp updates from buster point release
* 14:17 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 13:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 13:15 jayme: restarting pybal on lvs5001,lvs4005,lvs2007 - [[phab:T271573|T271573]]
* 13:14 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 3.17.0-wmf.1"
* 13:10 jayme: restarting pybal on lvs5002,lvs4006,lvs2008 - [[phab:T271573|T271573]]
* 13:04 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
* 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 13:03 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
* 13:02 moritzm: upgrading deployment servers to PHP 7.4.32
* 12:55 moritzm: upgrading snapshot hosts to PHP 7.4.32
* 12:48 jayme: restarting pybal on lvs2009 - [[phab:T271573|T271573]]
* 12:45 moritzm: upgrading labweb to PHP 7.4.32
* 12:43 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 12:42 jayme: restarting pybal on lvs5003,lvs4007 - [[phab:T271573|T271573]]
* 12:39 jayme: restarting pybal on lvs2010 - [[phab:T271573|T271573]]
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 12:28 apergos: manually edited /srv/deployment/dumps/dumps-cache/config on snapshots1011,12,13 to change deploy1001 to deploy1002 (where did it get the old value from? these are new installs!)
* 12:16 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 12:15 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 11:53 jayme: switching SRV record _etcd._tcp to new etcd cluster (for codfw, eqsin, ulsfo)
* 11:22 Urbanecm: EU B&C window done
* 11:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: {{Gerrit|8d0ae5e8fedefa911fc216bfc810d7a6169ea7e5}}: Separate reference preview settings in beta & non-beta ([[phab:T281235|T281235]]) (duration: 01m 08s)
* 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddbc378e41783356e28cd90bbefa08624ea2844c}}: Enable partial action blocks on testwiki ([[phab:T280528|T280528]]) (duration: 01m 07s)
* 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 10:44 jbond42: updated the check-raid nrpe script to python3
* 09:40 moritzm: restarting Tomcat on idp-test1001 to pick up Java security updates
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15618 and previous config saved to /var/cache/conftool/dbconfig/20210428-092103-root.json
* 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1001.wikimedia.org
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint1001.wikimedia.org
* 09:09 moritzm: restarting jenkins* on releases to pick up Java security updates
* 09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15617 and previous config saved to /var/cache/conftool/dbconfig/20210428-090559-root.json
* 08:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15616 and previous config saved to /var/cache/conftool/dbconfig/20210428-085056-root.json
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 01m 08s)
* 08:41 urbanecm@deploy1002: sync-file aborted: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 00m 02s)
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15615 and previous config saved to /var/cache/conftool/dbconfig/20210428-083625-marostegui.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15614 and previous config saved to /var/cache/conftool/dbconfig/20210428-083552-root.json
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15613 and previous config saved to /var/cache/conftool/dbconfig/20210428-083458-root.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15612 and previous config saved to /var/cache/conftool/dbconfig/20210428-082625-root.json
* 08:25 effie: update php7.2 on jobrunners and parsoid servers && rolling  php7.2-fpm restarts
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15611 and previous config saved to /var/cache/conftool/dbconfig/20210428-081121-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15610 and previous config saved to /var/cache/conftool/dbconfig/20210428-075618-root.json
* 07:52 effie: update php7.2 on api servers && rolling  php7.2-fpm restarts
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15609 and previous config saved to /var/cache/conftool/dbconfig/20210428-074114-root.json
* 07:40 marostegui: Deploy schema change on db1098:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 07:27 effie: update php7.2 on appservers && rolling  php7.2-fpm restarts
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for schema change and kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15608 and previous config saved to /var/cache/conftool/dbconfig/20210428-072609-marostegui.json
* 07:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:14 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:12 elukey: add AAAA record for kafka-main200[3,4,5].codfw.wmnet
* 07:10 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:05 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:04 elukey: add AAAA record for kafka-main2002.codfw.wmnet
* 07:03 marostegui: Deploy schema change on db2089:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:26 legoktm: created mailman3 superusers for Administrator (noc@), Ladsgroup and Legoktm
* 06:23 legoktm: legoktm@lists1001:~$ sudo mailman-web set_default_site --name lists.wikimedia.org --domain lists.wikimedia.org
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15607 and previous config saved to /var/cache/conftool/dbconfig/20210428-061426-root.json
* 06:00 marostegui: Stop MySQL on db2096 (x1 codfw) [[phab:T281135|T281135]]
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15606 and previous config saved to /var/cache/conftool/dbconfig/20210428-055922-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1167 in s8 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15605 and previous config saved to /var/cache/conftool/dbconfig/20210428-055144-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15604 and previous config saved to /var/cache/conftool/dbconfig/20210428-054419-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15603 and previous config saved to /var/cache/conftool/dbconfig/20210428-052915-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15602 and previous config saved to /var/cache/conftool/dbconfig/20210428-051526-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 (old s1 master) for schema change', diff saved to https://phabricator.wikimedia.org/P15601 and previous config saved to /var/cache/conftool/dbconfig/20210428-050754-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 master and remove read-only from s1 [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15600 and previous config saved to /var/cache/conftool/dbconfig/20210428-050138-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15599 and previous config saved to /var/cache/conftool/dbconfig/20210428-050041-marostegui.json
* 05:00 marostegui: Starting s1 eqiad failover from db1083 to db1163 - [[phab:T278214|T278214]]
* 04:14 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 04:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:08 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 04:08 marostegui: Start replication changes, connect everything to db1163 [[phab:T278214|T278214]]
* 04:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 before the switchover [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15598 and previous config saved to /var/cache/conftool/dbconfig/20210428-040718-marostegui.json
* 03:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:49 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet
* 03:48 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1013.eqiad.wmnet
* 03:33 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1012` to clear the `WDQS SPARQL` warning
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2007.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 02:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:28 robh@cumin1001: START - Cookbook sre.dns.netbox
* 01:06 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 00:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE


== 2021-04-27 ==
== 2021-11-26 ==
* 23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 16:11 arnoldokoth: drain kubestage1002 node in prep for decommissioning
* 23:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 16:05 arnoldokoth: drain kubestage1001 node
* 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE
* 23:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 23:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 23:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 21:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2005-2006].codfw.wmnet
* 20:55 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2005-2006].codfw.wmnet
* 20:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2003-2004].codfw.wmnet
* 20:42 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2003-2004].codfw.wmnet
* 20:32 bblack: re-pooling codfw public traffic - [[phab:T279457|T279457]]
* 20:11 jhuneidi@deploy1002: Synchronized php-1.37.0-wmf.3/includes/rcfeed/IRCColourfulRCFeedFormatter.php: Backport rcfeed: Remove reference assignment ([[phab:T281226|T281226]]) to 1.37.0-wmf.3 (duration: 01m 12s)
* 20:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 20:06 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 19:44 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
* 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 19:35 papaul: powerdown ms-backup2001  for maintenance
* 19:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 19:07 papaul: powerdown logstash2035  for maintenance
* 19:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
* 19:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet
* 18:50 mutante: people1003 - destroying VM and recreating again from scratch to test if issue of no console and no access is repeatable
* 18:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet
* 18:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 18:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 18:33 mutante: people1003 - rebooting, trying to get new VM to work
* 18:33 Urbanecm: Morning B&C window done
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|91a85f2}}: {{Gerrit|ac770bf}}: Enable language in header for office and testwiki users ([[phab:T280526|T280526]]) (duration: 01m 19s)
* 18:32 bblack: lvs2009 - restart pybal + re-run puppet agent - [[phab:T279457|T279457]]
* 18:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:20 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[56].codfw.wmnet
* 18:20 bblack: cp203[56] - repooling in etcd - [[phab:T279457|T279457]]
* 18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:17 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:17 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:16 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:12 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:11 bblack: dns2001 - restarting bird to repool, then re-enabling puppet - [[phab:T279457|T279457]]
* 18:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:02 ejegg: update payments-wiki from {{Gerrit|9a4eef1375}} to {{Gerrit|44570561f2}}
* 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 17:58 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 17:34 papaul: powerdown moss-fe2001  for maintenance
* 17:32 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:29 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:25 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:23 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:21 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:19 ryankemper: [[phab:T281215|T281215]] Banned `elastic2043` from codfw cirrussearch cluster
* 17:16 mbsantos@deploy1002: helmfile


== 2021-04-12 ==
== 2021-11-25 ==
* 23:25 krinkle@deploy1002: Synchronized wmf-config/mc.php: {{Gerrit|I390b4726d01037107}} (duration: 00m 58s)
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17872 and previous config saved to /var/cache/conftool/dbconfig/20211125-204357-ladsgroup.json
* 23:06 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:657696{{!}}wgAbuseFilterAflFilterMigrationStage: Make COMPAT_NEW in production (T269712)]] (duration: 00m 58s)
* 20:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|117743f695b5cd4b9fa99ff8aaa00d3f9a1d8889}}: Enable assignment of importupload on enwikibooks ([[phab:T278683|T278683]]) (duration: 00m 57s)
* 20:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 18:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a1949fd15a5a6fe745f6b807b2716ccb2a287476}}: Add extendedconfirmed on svwiki ([[phab:T279836|T279836]]) (duration: 00m 59s)
* 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5d275ec5378c5faea356aeb0fc985f4d815efed1}}: Add abusefilter-maintainer to wmgPrivilegedGlobalGroups ([[phab:T279835|T279835]]) (duration: 00m 58s)
* 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|13b10d3b1b0b3ff48077a7d212a0eddd6214ce22}}: Enable <mapframe> on bswiki ([[phab:T279635|T279635]]) (duration: 00m 57s)
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17871 and previous config saved to /var/cache/conftool/dbconfig/20211125-192850-ladsgroup.json
* 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ae05f7cd53925c06d8a23cb8f667a20d79ce2cff}}: Replace ombudsman with ombuds in wmgPrivilegedGlobalGroups ([[phab:T256299|T256299]]) (duration: 00m 57s)
* 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17870 and previous config saved to /var/cache/conftool/dbconfig/20211125-191345-ladsgroup.json
* 18:03 urbanecm@deploy1002: sync-file aborted: {{Gerrit|ae05f7cd53925c06d8a23cb8f667a20d79ce2cff}}: Replace ombudsman with ombuds in wmgPrivilegedGlobalGroups ([[phab:T256299|T256299]]ú (duration: 00m 00s)
* 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17869 and previous config saved to /var/cache/conftool/dbconfig/20211125-185841-ladsgroup.json
* 11:29 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:678569{{!}}Disable legacy javascript in jawiki (T72470)]] (duration: 00m 56s)
* 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17868 and previous config saved to /var/cache/conftool/dbconfig/20211125-184336-ladsgroup.json
* 11:26 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.38/extensions/FlaggedRevs/frontend/FlaggedRevsXML.php: Backport: [[gerrit:678347{{!}}Don't do strict equal condition check (T279750)]] (duration: 00m 57s)
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17867 and previous config saved to /var/cache/conftool/dbconfig/20211125-172714-ladsgroup.json
* 11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NO-OP: {{Gerrit|6c03d6a59086fa42ec4fc9d289c819a4d3b8e052}}: Explicitly set wgGEMentorshipMigrationStage: WRITE_OLD/READ_OLD ([[phab:T279853|T279853]]) (duration: 00m 58s)
* 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:677928{{!}}wikidata: post edit constraint jobs on 60% of edits (T204031)]] (duration: 01m 13s)
* 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:677920{{!}}Remove all remains of idGeneratorLogging (T274156)]] (2/2, Beta-only) (duration: 00m 56s)
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17866 and previous config saved to /var/cache/conftool/dbconfig/20211125-172707-ladsgroup.json
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:677920{{!}}Remove all remains of idGeneratorLogging (T274156)]] (1/2) (duration: 00m 57s)
* 17:12 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference
* 11:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:677560{{!}}Remove idGeneratorLogging (T274156)]] (duration: 00m 58s)
* 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17864 and previous config saved to /var/cache/conftool/dbconfig/20211125-171202-ladsgroup.json
* 11:00 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:678574{{!}}Bumping portals to master (T279398 T279419)]] (duration: 00m 58s)
* 16:57 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6 (duration: 06m 59s)
* 10:59 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:678574{{!}}Bumping portals to master (T279398 T279419)]] (duration: 00m 58s)
* 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17863 and previous config saved to /var/cache/conftool/dbconfig/20211125-165657-ladsgroup.json
* 09:55 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 00m 57s)
* 16:50 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6
* 09:44 Urbanecm: Start server-side upload for 4 video files #2 ([[phab:T279878|T279878]], [[phab:T279839|T279839]], [[phab:T279818|T279818]])
* 16:49 jynus@cumin1001: dbctl commit (dc=all): 'Fully repool db1163', diff saved to https://phabricator.wikimedia.org/P17862 and previous config saved to /var/cache/conftool/dbconfig/20211125-164941-jynus.json
* 08:43 Urbanecm: Start server-side upload for 4 video files ([[phab:T279878|T279878]], [[phab:T279839|T279839]], [[phab:T279818|T279818]])
* 16:46 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next (duration: 01m 04s)
* 08:08 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw1318.eqiad.wmnet
* 16:45 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next
* 08:07 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw1334.eqiad.wmnet
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17861 and previous config saved to /var/cache/conftool/dbconfig/20211125-164153-ladsgroup.json
* 08:07 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1311.eqiad.wmnet
* 16:18 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163++', diff saved to https://phabricator.wikimedia.org/P17860 and previous config saved to /var/cache/conftool/dbconfig/20211125-161833-jynus.json
* 08:06 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1318.eqiad.wmnet
* 16:14 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163+', diff saved to https://phabricator.wikimedia.org/P17859 and previous config saved to /var/cache/conftool/dbconfig/20211125-161404-jynus.json
* 08:06 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1334.eqiad.wmnet
* 16:10 klausman: restarting pybal on lvs2009 [[phab:T289835|T289835]]
* 08:05 vgutierrez: restart acme-chief
* 15:57 vgutierrez: restarting pybal  on lvs2010 - [[phab:T289835|T289835]]
* 15:55 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163', diff saved to https://phabricator.wikimedia.org/P17856 and previous config saved to /var/cache/conftool/dbconfig/20211125-155538-jynus.json
* 15:47 jynus: reenable gtid on db1163
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17853 and previous config saved to /var/cache/conftool/dbconfig/20211125-152906-ladsgroup.json
* 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17852 and previous config saved to /var/cache/conftool/dbconfig/20211125-152858-ladsgroup.json
* 15:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1001.eqiad.wmnet
* 15:19 klausman@cumin1001: conftool action : set/pooled=yes:weight=1; selector: cluster=ml_serve,service=kubesvc
* 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17851 and previous config saved to /var/cache/conftool/dbconfig/20211125-151354-ladsgroup.json
* 15:13 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping1001.eqiad.wmnet
* 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3001.esams.wmnet
* 15:05 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping3001.esams.wmnet
* 15:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2001.codfw.wmnet
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17850 and previous config saved to /var/cache/conftool/dbconfig/20211125-145849-ladsgroup.json
* 14:54 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping2001.codfw.wmnet
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17849 and previous config saved to /var/cache/conftool/dbconfig/20211125-144344-ladsgroup.json
* 14:42 XioNoX: Update ping redirect to point to new ping VMs - [[phab:T295767|T295767]]
* 14:25 jayme: uncordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet - [[phab:T293729|T293729]]
* 14:17 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:16 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:12 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1002.eqiad.wmnet
* 13:32 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping1002.eqiad.wmnet
* 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2002.codfw.wmnet
* 13:28 Amir1: killing lingering process from mwmaint to depooled db1147
* 13:20 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping2002.codfw.wmnet
* 13:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3002.esams.wmnet
* 13:05 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping3002.esams.wmnet
* 12:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
* 12:14 arturo: update repo bullseye-wikimedia/thirdparty/ceph-octopus ([[phab:T296175|T296175]])
* 12:14 jynus: disable temp. gtid on db1163
* 12:11 jynus@cumin1001: dbctl commit (dc=all): 'Temp. depool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17847 and previous config saved to /var/cache/conftool/dbconfig/20211125-121138-jynus.json
* 12:04 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load even more', diff saved to https://phabricator.wikimedia.org/P17846 and previous config saved to /var/cache/conftool/dbconfig/20211125-120435-jynus.json
* 11:56 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
* 11:56 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load', diff saved to https://phabricator.wikimedia.org/P17845 and previous config saved to /var/cache/conftool/dbconfig/20211125-115602-jynus.json
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17844 and previous config saved to /var/cache/conftool/dbconfig/20211125-110443-ladsgroup.json
* 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17843 and previous config saved to /var/cache/conftool/dbconfig/20211125-110435-ladsgroup.json
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17842 and previous config saved to /var/cache/conftool/dbconfig/20211125-104930-ladsgroup.json
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17841 and previous config saved to /var/cache/conftool/dbconfig/20211125-103425-ladsgroup.json
* 10:25 vgutierrez: rolling restart of varnish and HAProxy on cp2042.codfw.wmnet,cp1090.eqiad.wmnet,cp[5012].eqsin.wmnet,cp3065.esams.wmnet,cp[4026,4032].ulsfo.wmnet' to disable PROXY protocol - [[phab:T290005|T290005]]
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17840 and previous config saved to /var/cache/conftool/dbconfig/20211125-101921-ladsgroup.json
* 09:55 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxh
* 09:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 09:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:39 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 09:29 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 09:27 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 09:24 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 09:23 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 09:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 09:19 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 09:16 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 09:10 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 09:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:59 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:51 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:50 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17837 and previous config saved to /var/cache/conftool/dbconfig/20211125-084834-ladsgroup.json
* 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:47 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 08:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:18 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:17 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:14 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 08:13 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 08:09 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 08:08 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 08:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 08:03 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:00 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 07:57 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 07:56 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1128.eqiad.wmnet with OS bullseye
* 07:51 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(echostore{{!}}sessionstore)
* 07:49 marostegui: Stop mysql on db1133 to clone db1128 as a test host [[phab:T295965|T295965]]
* 07:49 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 07:48 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 07:47 jayme: elevated MediaWiki exceptions and fatals (from ~07:35) due to a mistake during re-deploy of eventgate-main
* 07:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 07:35 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:29 elukey_: elukey@mwdebug2002:~$ sudo systemctl reset-failed ifup@ens5.service
* 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye
* 07:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:20 jelto@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntax
* 07:17 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 32 hosts with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 07:17 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 32 hosts with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 07:10 jelto: downtime PyBal backends health check on lvs1015 and lvs1016 for helm3 de-deploy [[phab:T251305|T251305]]. I'm keeping an eye on icing and remove downtime as soon as I'm finished
* 07:09 jelto: start re-deploy procedure in eqiad Kubernetes [[phab:T251305|T251305]]
* 06:31 marostegui: Restart tendril's DB
* 05:51 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 04:45 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS (duration: 05m 27s)
* 04:43 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.93` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
* 04:40 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS
* 04:39 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:35 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7]: 0.3.93 (duration: 09m 23s)
* 04:30 ryankemper: [Elastic] Cleaning up dangling apt packages: `ryankemper@cumin1001:~$ sudo cumin -b 4 'elastic*' 'sudo apt autoremove -y'`
* 04:27 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.93` on canary `wdqs1003`; proceeding to rest of fleet
* 04:25 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7]: 0.3.93
* 04:25 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.93`. Pre-deploy tests passing on canary `wdqs1003`
* 03:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS buster
* 02:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS buster
* 02:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS buster
* 02:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS buster
* 02:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS buster
* 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS buster
* 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS buster
* 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS buster
* 01:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS buster
* 01:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS buster
* 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS buster


== 2021-04-10 ==
== 2021-11-24 ==
* 14:21 andrew@deploy1002: Finished deploy [horizon/deploy@ee1be56]: fix for [[phab:T279699|T279699]] (duration: 04m 12s)
* 23:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS buster
* 14:17 andrew@deploy1002: Started deploy [horizon/deploy@ee1be56]: fix for [[phab:T279699|T279699]]
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS buster
* 14:11 andrew@deploy1002: Finished deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for [[phab:T279699|T279699]] (duration: 02m 21s)
* 23:44 mutante: puppetmaster1001:~] $ sudo puppet cert sign gitlab-runner1001.eqiad.wmnet {{!}}  sudo install_console gitlab-runner1001.eqiad.wmnet ([[phab:T295481|T295481]])
* 14:08 andrew@deploy1002: Started deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for [[phab:T279699|T279699]]
* 23:26 mutante: ganeti - bringing up new VM - sudo gnt-instance start gitlab-runner1001.eqiad.wmnet ; ran puppet on install1003; installing OS [[phab:T295481|T295481]]
* 14:08 andrew@deploy1002: Finished deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for [[phab:T279699|T279699]] (duration: 00m 11s)
* 23:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS buster
* 14:08 andrew@deploy1002: Started deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for [[phab:T279699|T279699]]
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS buster
* 23:09 mutante: mwmaint1002 - sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size 1M -delete  - to fix Icinga alert about large files in client bucket
* 23:08 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet
* 23:03 mutante: wcqs1001 -  sudo systemctl restart wcqs-blazegraph - after <+jinxer-wm> (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wcqs1001:9195 is burning free allocators
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet
* 22:50 mutante: Creating a new Ganeti VM and wondering which row to put it? [ganeti1009:~] $ for row in A B C D; do echo "row $<nowiki>{</nowiki>row<nowiki>}</nowiki>: $(sudo gnt-instance list -o name -F "pnode.group == 'row_$<nowiki>{</nowiki>row<nowiki>}</nowiki>'" {{!}} wc -l) VMs"; done
* 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.wikimedia.org
* 22:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS buster
* 22:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS buster
* 22:38 mutante: running decom cookbook on gitlab-runner1001.wikimedia.org VM which was in state "ADMIN_down" and not used yet. to make room to recreate it as gitlab-runner1001.eqiad.wmnet [[phab:T295481|T295481]]
* 22:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.wikimedia.org
* 22:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS buster
* 22:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS buster
* 21:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:35 legoktm@deploy1002: Synchronized wmf-config/: Improve docs on $wmgUseGlobalAbuseFilters and sort list of wikis (duration: 00m 57s)
* 21:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS buster
* 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2061.codfw.wmnet with OS buster
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:54 legoktm@deploy1002: Synchronized wmf-config/: Update configuration related to disabling Score functionality (duration: 00m 57s)
* 20:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2061.codfw.wmnet with OS buster
* 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17834 and previous config saved to /var/cache/conftool/dbconfig/20211124-194857-ladsgroup.json
* 19:38 razzi: `sudo maintain-views --all-databases --replace-all` on clouddb1018 for [[phab:T292594|T292594]]
* 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17833 and previous config saved to /var/cache/conftool/dbconfig/20211124-193352-ladsgroup.json
* 19:19 razzi: run `maintain-views --all-databases --replace-all` on clouddb1013 for [[phab:T292594|T292594]]
* 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17832 and previous config saved to /var/cache/conftool/dbconfig/20211124-191847-ladsgroup.json
* 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17831 and previous config saved to /var/cache/conftool/dbconfig/20211124-190343-ladsgroup.json
* 18:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2002.codfw.wmnet
* 18:51 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2002.codfw.wmnet
* 18:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2001.codfw.wmnet
* 18:43 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
* 18:42 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM ncredir2001.codfw.wmnet
* 18:42 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
* 18:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief-test2001.codfw.wmnet
* 18:36 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief-test2001.codfw.wmnet
* 18:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2001.codfw.wmnet
* 18:30 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2001.codfw.wmnet
* 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17830 and previous config saved to /var/cache/conftool/dbconfig/20211124-174723-ladsgroup.json
* 17:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 17:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17829 and previous config saved to /var/cache/conftool/dbconfig/20211124-174615-ladsgroup.json
* 17:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:741134{{!}}rdbms: Add full query to transaction profiler (T295706)]] (duration: 00m 56s)
* 17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:34 jhathaway@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=puppetboard
* 17:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17828 and previous config saved to /var/cache/conftool/dbconfig/20211124-173110-ladsgroup.json
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2016.codfw.wmnet
* 17:22 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
* 17:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2016.codfw.wmnet
* 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM chartmuseum2001.codfw.wmnet
* 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2015.codfw.wmnet
* 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2015.codfw.wmnet
* 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM chartmuseum2001.codfw.wmnet
* 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17827 and previous config saved to /var/cache/conftool/dbconfig/20211124-171604-ladsgroup.json
* 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2006.codfw.wmnet
* 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2004.codfw.wmnet
* 17:08 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
* 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2004.codfw.wmnet
* 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2006.codfw.wmnet
* 17:05 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399] (duration: 06m 45s)
* 17:05 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2003.codfw.wmnet
* 17:01 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2003.codfw.wmnet
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17826 and previous config saved to /var/cache/conftool/dbconfig/20211124-170100-ladsgroup.json
* 17:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2005.codfw.wmnet
* 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399]
* 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399] (duration: 00m 07s)
* 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399]
* 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399] (duration: 32m 50s)
* 16:56 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2005.codfw.wmnet
* 16:50 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:44 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:43 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2005.codfw.wmnet
* 16:43 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:41 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2005.codfw.wmnet
* 16:41 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:40 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:38 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2006.codfw.wmnet
* 16:36 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2002.codfw.wmnet
* 16:36 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2006.codfw.wmnet
* 16:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:741132{{!}}rdbms: Make TransactionProfiler logs more useful (T295706)]] (duration: 00m 57s)
* 16:33 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2002.codfw.wmnet
* 16:33 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2004.codfw.wmnet
* 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:31 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2003.codfw.wmnet
* 16:31 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2004.codfw.wmnet
* 16:29 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2003.codfw.wmnet
* 16:25 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2001.codfw.wmnet
* 16:25 mforns@deploy1002: Started deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399]
* 16:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
* 16:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2001.codfw.wmnet
* 16:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
* 16:19 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
* 16:16 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
* 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 Amir1: start of  "foreachwikiindblist s3 migrateRevisionActorTemp.php --sleep=2" in mwmaint1002 in a screen. It will take a month or  so ([[phab:T275246|T275246]])
* 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:00 btullis: systemctl reset-failed ifup@ens5.service on schema2004 [[phab:T273026|T273026]]
* 15:48 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2004.codfw.wmnet
* 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17821 and previous config saved to /var/cache/conftool/dbconfig/20211124-154533-ladsgroup.json
* 15:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17820 and previous config saved to /var/cache/conftool/dbconfig/20211124-154236-ladsgroup.json
* 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafkamon2002.codfw.wmnet
* 15:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2004.codfw.wmnet
* 15:36 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2003.codfw.wmnet
* 15:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kafkamon2002.codfw.wmnet
* 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc2001.wikimedia.org
* 15:34 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2003.codfw.wmnet
* 15:32 papaul: reboot ms-be2058 for firmware upgrade
* 15:31 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM irc2001.wikimedia.org
* 15:30 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2001.codfw.wmnet
* 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17819 and previous config saved to /var/cache/conftool/dbconfig/20211124-152731-ladsgroup.json
* 15:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2001.codfw.wmnet
* 15:21 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dragonfly-supernode2001.codfw.wmnet
* 15:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM dragonfly-supernode2001.codfw.wmnet
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab2001.wikimedia.org
* 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17817 and previous config saved to /var/cache/conftool/dbconfig/20211124-151226-ladsgroup.json
* 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM gitlab2001.wikimedia.org
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow2001.codfw.wmnet
* 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow2001.codfw.wmnet
* 14:59 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
* 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17815 and previous config saved to /var/cache/conftool/dbconfig/20211124-145721-ladsgroup.json
* 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
* 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM search-loader2001.codfw.wmnet
* 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM search-loader2001.codfw.wmnet
* 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
* 14:39 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2031.codfw.wmnet
* 14:36 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2031.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
* 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2001.wikimedia.org
* 14:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:32 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:31 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2030.codfw.wmnet
* 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:28 godog: systemctl reset-failed ifup@ens5.service on logstash2024 [[phab:T273026|T273026]]
* 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2001.wikimedia.org
* 14:26 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2030.codfw.wmnet
* 14:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp2001.wikimedia.org
* 14:21 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2025.codfw.wmnet
* 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:15 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2025.codfw.wmnet
* 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2001.wikimedia.org
* 14:10 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2024.codfw.wmnet
* 14:06 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2001.wikimedia.org
* 14:00 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2024.codfw.wmnet
* 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM serpens.wikimedia.org
* 13:55 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2023.codfw.wmnet
* 13:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM serpens.wikimedia.org
* 13:49 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2023.codfw.wmnet
* 13:41 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2006.codfw.wmnet
* 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:39 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2006.codfw.wmnet
* 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17813 and previous config saved to /var/cache/conftool/dbconfig/20211124-133809-ladsgroup.json
* 13:37 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2005.codfw.wmnet
* 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2006.wikimedia.org
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17812 and previous config saved to /var/cache/conftool/dbconfig/20211124-133628-ladsgroup.json
* 13:36 XioNoX: add Jayme r/o user to all network devices
* 13:35 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2005.codfw.wmnet
* 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2006.wikimedia.org
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2005.wikimedia.org
* 13:30 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2004.codfw.wmnet
* 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2005.wikimedia.org
* 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
* 13:27 filippo@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM logstash2004.codfw.wmnet
* 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
* 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-corp2001.wikimedia.org
* 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-corp2001.wikimedia.org
* 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17811 and previous config saved to /var/cache/conftool/dbconfig/20211124-131519-ladsgroup.json
* 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17810 and previous config saved to /var/cache/conftool/dbconfig/20211124-130200-ladsgroup.json
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt2001.wikimedia.org
* 12:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM apt2001.wikimedia.org
* 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana2001.codfw.wmnet
* 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM grafana2001.codfw.wmnet
* 12:48 jbond: enable puppet post puppetdb reboot
* 12:48 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb2002.codfw.wmnet
* 12:46 jelto@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxh
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17809 and previous config saved to /var/cache/conftool/dbconfig/20211124-124420-ladsgroup.json
* 12:43 jbond@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb2002.codfw.wmnet
* 12:37 jbond: disable puppet for puppetdb reboot
* 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2002.wikimedia.org
* 12:29 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2002.wikimedia.org
* 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2001.wikimedia.org
* 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2001.wikimedia.org
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM releases2002.codfw.wmnet
* 12:23 awight: EU scap deployment finished
* 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM releases2002.codfw.wmnet
* 12:21 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:737195{{!}}Replace global with parent scope]] (duration: 00m 55s)
* 12:16 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:737193{{!}}[lint] fully-qualify classname]] (duration: 00m 55s)
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netboxdb2001.codfw.wmnet
* 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netboxdb2001.codfw.wmnet
* 12:10 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:740766{{!}}VisualEditor template dialog: new sidebar and inline descriptions (T284203, T286992)]] (duration: 00m 57s)
* 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox2001.wikimedia.org
* 12:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:03 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox2001.wikimedia.org
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox-dev2001.wikimedia.org
* 12:02 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 12:01 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 11:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox-dev2001.wikimedia.org
* 11:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki2002.codfw.wmnet
* 11:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM rpki2002.codfw.wmnet
* 11:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2003.codfw.wmnet
* 11:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 11:49 moritzm: systemctl reset-failed ifup@ens5.service on poolcounter2003 [[phab:T273026|T273026]]
* 11:48 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2003.codfw.wmnet
* 11:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2004.codfw.wmnet
* 11:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 11:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2004.codfw.wmnet
* 11:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:35 godog: bounce apache2 on logstash1025
* 11:35 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:32 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 Amir1: optimizing image.commonswiki in db1141 ([[phab:T296143|T296143]])
* 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17808 and previous config saved to /var/cache/conftool/dbconfig/20211124-112539-ladsgroup.json
* 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2004.codfw.wmnet
* 11:23 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2004.codfw.wmnet
* 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2003.codfw.wmnet
* 11:15 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2003.codfw.wmnet
* 11:13 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2002.codfw.wmnet
* 11:05 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2002.codfw.wmnet
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2001.codfw.wmnet
* 10:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 10:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 10:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2001.codfw.wmnet
* 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui2001.codfw.wmnet
* 10:48 XioNoX: rollback: disable ping-offload for codfw - [[phab:T294119|T294119]]
* 10:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM xhgui2001.codfw.wmnet
* 10:47 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:46 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM people2002.codfw.wmnet
* 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM people2002.codfw.wmnet
* 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ping2001.codfw.wmnet
* 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ping2001.codfw.wmnet
* 10:27 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 10:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:25 XioNoX: disable ping-offload for codfw - [[phab:T294119|T294119]]
* 10:24 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:17 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:14 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:13 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:12 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 10:06 jelto: downtime PyBal backends health check for helm3 de-deploy [[phab:T251305|T251305]]. I'm keeping an eye on icing and remove downtime as soon as I'm finished
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2002.codfw.wmnet
* 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2002.codfw.wmnet
* 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
* 10:02 vgutierrez: repool cp5006 - [[phab:T290005|T290005]]
* 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2001.codfw.wmnet
* 10:00 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
* 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2001.codfw.wmnet
* 09:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM debmonitor2002.codfw.wmnet
* 09:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
* 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM debmonitor2002.codfw.wmnet
* 09:54 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:53 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
* 09:53 vgutierrez: restart varnish/haproxy on cp5006 - [[phab:T290005|T290005]]
* 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
* 09:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install2003.wikimedia.org
* 09:49 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
* 09:46 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
* 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install2003.wikimedia.org
* 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mx2001.wikimedia.org
* 09:45 vgutierrez: depool cp5006 - [[phab:T290005|T290005]]
* 09:43 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
* 09:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mx2001.wikimedia.org
* 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM planet2002.codfw.wmnet
* 09:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM planet2002.codfw.wmnet
* 09:30 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=apple-search,name=eqiad
* 09:24 jelto@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxhighlight{{!}}she
* 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM failoid2002.codfw.wmnet
* 09:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM failoid2002.codfw.wmnet
* 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
* 09:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
* 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM deneb.codfw.wmnet
* 09:08 _joe_: switching search.wikimedia.org to be served by the apple-search servcie
* 09:04 jelto: start re-deploy procedure in codfw Kubernetes [[phab:T251305|T251305]]
* 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM deneb.codfw.wmnet
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:56 _joe_: repooling cp2027
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:741082{{!}}Set actor migration to write both on all wikis (T275246)]] (duration: 00m 57s)
* 08:51 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:41 vgutierrez: depool cp2027
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 07:23 elukey: reboot kubernetes1018 (role::insetup) to verify negotiated speed of eth interface
* 07:12 elukey: drop /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-{{Gerrit|bebe254120f8}} and other blockmgr-* dirs on stat1006 to free space on the root partition
* 06:47 Amir1: running optimize table with replication on db1155:3314 ([[phab:T296143|T296143]])
* 06:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance ([[phab:T296143|T296143]])
* 06:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance ([[phab:T296143|T296143]])
* 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17807 and previous config saved to /var/cache/conftool/dbconfig/20211124-063228-root.json
* 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17806 and previous config saved to /var/cache/conftool/dbconfig/20211124-061725-root.json
* 06:05 marostegui: Upgrade db1128's kernel [[phab:T288720|T288720]]
* 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17805 and previous config saved to /var/cache/conftool/dbconfig/20211124-060221-root.json
* 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17804 and previous config saved to /var/cache/conftool/dbconfig/20211124-054718-root.json
* 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2012.codfw.wmnet with OS buster


== 2021-04-09 ==
== 2021-11-23 ==
* 14:07 jynus: retry es4 backup dump on eqiad (backup1002)
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2012.codfw.wmnet with OS buster
* 01:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-be2002.codfw.wmnet with reason: REIMAGE
* 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2011.codfw.wmnet with OS buster
* 01:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2002.codfw.wmnet with reason: REIMAGE
* 23:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2011.codfw.wmnet with OS buster
* 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-be2001.codfw.wmnet with reason: REIMAGE
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2010.codfw.wmnet with OS buster
* 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2001.codfw.wmnet with reason: REIMAGE
* 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2010.codfw.wmnet with OS buster
* 00:49 legoktm: imported mailman3 backports on apt.wm.o ([[phab:T278905|T278905]])
* 22:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2009.codfw.wmnet with OS buster
* 21:58 tgr: UTC evening deploys done
* 21:57 tgr@deploy1002: Finished scap: (no justification provided) (duration: 10m 03s)
* 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 21:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2009.codfw.wmnet with OS buster
* 21:53 krinkle@deploy1002: Finished deploy [integration/docroot@a3435a7]: (no justification provided) (duration: 00m 07s)
* 21:53 krinkle@deploy1002: Started deploy [integration/docroot@a3435a7]: (no justification provided)
* 21:47 tgr@deploy1002: Started scap: (no justification provided)
* 21:47 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740777{{!}}Add Image: Validate GEInfoboxTemplates size (T294518)]] (duration: 00m 56s)
* 21:39 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Api/ApiQueryGrowthTasks.php: Backport: [[gerrit:740776{{!}}Structured task caching/filtering cherry-picks step 3]] (duration: 00m 55s)
* 21:35 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740775{{!}}Structured task caching/filtering cherry-picks step 2]] (duration: 00m 57s)
* 21:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:04 legoktm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Echo/: re-enable cross-wiki notifications by default ([[phab:T296270|T296270]]) (duration: 00m 57s)
* 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:52 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:51 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|7d5f779a73594bb11f359bda055f2c7af8e92feb}}: Structured task caching/filtering cherry-picks, step 1 (duration: 00m 56s)
* 19:42 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|c26e407118e1cd8e1e3fea6e2f4e3e43a609ea62}}: GrowthExperiments backports (duration: 01m 03s)
* 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 2/2) (duration: 00m 56s)
* 19:17 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 1/2) (duration: 00m 56s)
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3993aacbfdbbfb6cdcc198ce369bf08b32ace865}}: Increase reading depth sampling rate to .1% ([[phab:T294777|T294777]]) (duration: 00m 57s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:29 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:25 ejegg: updated SmashPig standalone (IPN listener) from {{Gerrit|be68299b}} -> {{Gerrit|211f8e65}}
* 18:18 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:18 cmjohnson1: upgrading msw-c1-eqiad [[phab:T259758|T259758]]
* 18:04 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:01 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:00 moritzm: systemctl reset-failed ifup@ens5.service on durum2001 [[phab:T273026|T273026]]
* 17:59 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 17:55 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
* 17:49 mutante: miscweb1002 - rm -rf /srv/deployments/scholarships ([[phab:T243037|T243037]])
* 17:47 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
* 17:42 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
* 17:35 ebernhardson: [[phab:T295478|T295478]] start snapshot of commonswiki_file from cirrus codfw -> swift eqiad
* 17:34 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
* 17:33 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
* 17:31 cmjohnson1: upgrading msw's  in row D eqiad [[phab:T259758|T259758]]
* 17:28 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
* 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2012.codfw.wmnet with OS stretch
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2002.codfw.wmnet
* 17:14 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:14 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:11 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2002.codfw.wmnet
* 17:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2001.codfw.wmnet
* 17:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2001.codfw.wmnet
* 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb2002.codfw.wmnet
* 16:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb2002.codfw.wmnet
* 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc2001.codfw.wmnet
* 16:53 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc2001.codfw.wmnet
* 16:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2012.codfw.wmnet with OS stretch
* 16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2001.codfw.wmnet
* 16:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2001.codfw.wmnet
* 16:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
* 16:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
* 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2011.codfw.wmnet with OS stretch
* 16:13 cmjohnson1: updating mgmt switches in row C, racks C2-C8 eqiad [[phab:T259758|T259758]]
* 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
* 15:46 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2010.codfw.wmnet with OS stretch
* 15:41 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:27 Emperor: rolling restart of thanos frontends [[phab:T294380|T294380]]
* 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
* 14:40 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 14:34 jbond@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=puppetboard
* 14:30 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:00 marostegui: Failover m5 from db1128 to db1132 - [[phab:T288720|T288720]]
* 14:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2006.codfw.wmnet with OS bullseye
* 13:50 godog: powercycle (again) ms-be2058
* 13:48 godog: add 80G to prometheus global in eqiad
* 13:31 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2006.codfw.wmnet with OS bullseye
* 13:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2005.codfw.wmnet with OS bullseye
* 13:01 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
* 12:52 aborrero@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1002-dev.eqiad.wmnet
* 12:46 Lucas_WMDE: UTC morning backport+config window done
* 12:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:43 aborrero@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudbackup1002-dev.eqiad.wmnet
* 12:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:737503{{!}}Set up beta test environment for QuickSurveys (T293798)]] (beta only) (duration: 00m 55s)
* 12:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:29 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: [[gerrit:740784{{!}}OSD: Handle cases where the image srcset attr is not set (T296260)]] (duration: 00m 56s)
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:26 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: [[gerrit:740778{{!}}OSD: Add a ready hook for scripts (T180569)]] (duration: 00m 56s)
* 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:21 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
* 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
* 11:54 btullis@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
* 11:51 btullis@cumin1001: END (ERROR) - Cookbook sre.aqs.roll-restart (exit_code=97) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 11:51 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2002.codfw.wmnet
* 11:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2002.codfw.wmnet
* 11:25 godog: powercycle ms-be2058 - down and nothign on console
* 11:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5012.eqsin.wmnet with OS buster
* 11:15 vgutierrez: pool cp5012 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 Amir1: start of mwscript migrateRevisionActorTemp.php --wiki=testwiki --sleep=5 ([[phab:T275246|T275246]])
* 11:05 jayme: cordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet (we have issues with POD IP prefix allocation) - [[phab:T293729|T293729]]
* 11:05 jayme: uncordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet (we have issues with POD IP prefix allocation) - [[phab:T293729|T293729]]
* 11:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:02 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:740807{{!}}Set test wikis to write both for actor temp table migration (T275246)]] (duration: 00m 56s)
* 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17800 and previous config saved to /var/cache/conftool/dbconfig/20211123-102234-ladsgroup.json
* 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:19 urbanecm@deploy1002: Finished scap: {{Gerrit|c98acaa2ab27e630c0a1b55a64fb81b131c087f9}}: Backport localisation updates (duration: 11m 06s)
* 10:19 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:08 urbanecm@deploy1002: Started scap: {{Gerrit|c98acaa2ab27e630c0a1b55a64fb81b131c087f9}}: Backport localisation updates
* 10:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5012.eqsin.wmnet with OS buster
* 10:01 vgutierrez: depool cp5012 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 09:57 jayme: cordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet - [[phab:T293729|T293729]]
* 09:52 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1124.eqiad.wmnet with OS bullseye
* 09:27 Amir1: dropping useless GRANTs on s6 eqiad replicas without replication ([[phab:T296274|T296274]])
* 09:16 Amir1: dropping useless GRANTs on s6 eqiad master without replication ([[phab:T296274|T296274]])
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bullseye
* 09:05 Amir1: fixing incorrect grants of wikiadmin on localhost in s6 master in codfw with replication
* 07:52 topranks: Adjusting BGP on cr1-eqiad and cr2-eqiad to keep MED unchanged in iBGP.
* 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 05:29 ryankemper: [[phab:T295705|T295705]] Downtimed `elastic2044` for one hour and doing a full reboot for good measure. Already ran the plugin upgrade: `DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install elasticsearch-oss wmf-elasticsearch-search-plugins`
* 05:26 ryankemper: [[phab:T295705|T295705]] Rolling restart of `codfw` complete. `elastic2044` was manually restarted earlier today so the cookbook didn't restart it (b/c we pass in a datetime cutoff threshold) so I'm manually upgrading and restarting that host
* 05:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 04:17 ryankemper: [[phab:T295705|T295705]] Properly disabled the sane-itizer; we don't want it running until after we (a) complete rolling restarts and (b) restore the missing `commonswikI_file` index (which is blocked on the restarts)
* 03:42 Amir1: ladsgroup@mwmaint1002:~$ cat broken_imgs {{!}} xargs -I <nowiki>{</nowiki><nowiki>}</nowiki> mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --verbose --mime 'image/*' --force --batch-size 1 --sleep 1 --start=<nowiki>{</nowiki><nowiki>}</nowiki> --end=<nowiki>{</nowiki><nowiki>}</nowiki> ([[phab:T296001|T296001]])
* 03:37 Amir1: rebuilding metadata of all djvu files outside of commons ([[phab:T296001|T296001]])
* 03:06 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:58 ryankemper: [[phab:T295705|T295705]] `elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.codfw.wmnet', port=9243): Read timed out. (read timeout=60))` Probably transient failure; will wait 10 mins and try again
* 02:57 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:55 ryankemper: [[phab:T295705|T295705]] `ryankemper@cumin1001:~$ sudo cookbook sre.elasticsearch.rolling-operation codfw "codfw plugin upgrade + restart" --upgrade --nodes-per-run 2 --start-datetime 2021-11-18T18:55:54 --task-id [[phab:T295705|T295705]]` on tmux `rolling_restarts_codfw`
* 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:17 urbanecm: UTC late window done
* 01:17 urbanecm@deploy1002: Finished scap: {{Gerrit|69aa4a7}}: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 4/4) (duration: 25m 50s)
* 01:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:51 urbanecm@deploy1002: Started scap: {{Gerrit|69aa4a7}}: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 4/4)
* 00:50 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/autoload.php: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 3/4) (duration: 00m 55s)
* 00:49 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specials/: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 2/4) (duration: 00m 55s)
* 00:48 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specialpage/SpecialPageFactory.php: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 1/4) (duration: 00m 56s)
* 00:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9209433dfc8b1f81a165ec75867337800db24b1}}: Enable reading depth instrumentation at low sampling rate ([[phab:T294777|T294777]]) (duration: 00m 56s)
* 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:30 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents: {{Gerrit|3f860c7}}: {{Gerrit|fa9fbf1}}: WikimediaEvents bbackports (2/2; [[phab:T294777|T294777]]) (duration: 00m 55s)
* 00:28 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/extension.json: {{Gerrit|3f860c72bca817c40486b90f0d8e0ffca72b2690}}: Restore ReadingDepth instrument (1/2) (duration: 00m 56s)
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:20 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/739908


== 2021-04-08 ==
== 2021-11-22 ==
* 23:48 brennen@deploy1002: Synchronized php-1.36.0-wmf.38/extensions/WikibaseMediaInfo/resources/mediasearch-vue/store/actions.js: Backport: [[gerrit:677956{{!}}Do not show "invalid search" message when request is aborted by user (TT277714)]] (duration: 00m 57s)
* 23:55 mutante: acmechief1001, acmechief-test1001: sudo systemctl restart reload-acme-chief-backend.timer
* 22:12 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 23:54 mutante: acmechief1001, acmechief-test1001: sudo systemctl start reload-acme-chief-backend.timer
* 22:12 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 23:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2011.codfw.wmnet with OS stretch
* 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 23:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2010.codfw.wmnet with OS stretch
* 21:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 23:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
* 21:56 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS stretch
* 21:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 22:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
* 21:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS stretch
* 21:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2028.codfw.wmnet with OS buster
* 21:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 21:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS buster
* 21:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 21:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2027.codfw.wmnet with OS buster
* 21:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 21:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2027.codfw.wmnet with OS buster
* 21:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:48 robh@