You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T245787 [nlwiki] Add noindex for NS_USER and NS_USER_TALK (duration: 00m 56s))
imported>Stashbot
(sukhe: disable puppet on dns4003 till we resolve the puppet failures)
 
(871 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-02-20 ==
== 2022-10-05 ==
* 23:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245787|T245787]] [nlwiki] Add noindex for NS_USER and NS_USER_TALK (duration: 00m 56s)
* 00:05 sukhe: disable puppet on dns4003 till we resolve the puppet failures
* 23:46 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgVectorPrintLogo for back-compat., not read since wmf.19 (duration: 00m 56s)
* 23:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw232[0-4].codfw.wmnet
* 23:45 mutante: gerrit1002 - test VM - rebooting for new disk
* 23:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw231[7-9].codfw.wmnet
* 23:33 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw232[0-4].codfw.wmnet
* 23:32 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw231[7-9].codfw.wmnet
* 23:32 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2381[7-9].codfw.wmnet
* 23:25 mutante: ganeti1003 - adding another virtual 20G disk to gerrit1002 ([[phab:T243808|T243808]])
* 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:04 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/includes/pager/IndexPager.php: IndexPager: Limit offset params to the max of the indices available (duration: 00m 56s)
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:28 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad
* 22:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8908dd1]: daemons: Install stack printing signal handler on SIGUSR1 (duration: 05m 05s)
* 22:23 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8908dd1]: daemons: Install stack printing signal handler on SIGUSR1
* 21:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245780|T245780]] [mediawikiwiki] Deny the 'flow-hide' right to logged out and non-autoconfirmed users (duration: 00m 56s)
* 20:07 James_F: Train 1.35.0-wmf.20 provisionally looks OK on all wikis. Closing [[phab:T233868|T233868]].
* 20:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.20
* 19:55 twentyafterfour: hotfix deployed
* 19:51 twentyafterfour: deploying phabricator hotfix:  https://phabricator.wikimedia.org/rPHEX2f36eee7ce67eb0c09e9bb0e79b42fc3b41d3597 for [[phab:T244165|T244165]]
* 19:33 bblack: codfw+ulsfo repooled in geodns
* 18:20 fdans@deploy1001: Finished deploy [analytics/refinery@e05ae16]: deploying refinery (duration: 11m 31s)
* 18:08 fdans@deploy1001: Started deploy [analytics/refinery@e05ae16]: deploying refinery
* 17:38 bblack: pushed codfw+ulsfo geodns depool
* 16:45 jynus: stop, upgrade and restart dbprov2002
* 16:26 jynus: stop, upgrade and restart dbprov1002
* 16:23 moritzm: installing Java security updates on Hadoop/Kafka Jumbo/AQS/Druid
* 16:16 jynus: stop, upgrade and restart db1140
* 16:12 moritzm: installing postgres security updates on netboxdb*
* 16:03 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@125cffa]: deploying aqs, third time is the charm (duration: 06m 15s)
* 15:57 fdans@deploy1001: Started deploy [analytics/aqs/deploy@125cffa]: deploying aqs, third time is the charm
* 15:40 marostegui: Poweroff es2022 [[phab:T245714|T245714]]
* 15:32 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@95a7999]: deploying aqs (duration: 00m 48s)
* 15:32 fdans@deploy1001: Started deploy [analytics/aqs/deploy@95a7999]: deploying aqs
* 15:23 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@cbc3241]: deploying aqs (duration: 04m 06s)
* 15:19 fdans@deploy1001: Started deploy [analytics/aqs/deploy@cbc3241]: deploying aqs
* 14:38 Urbanecm: [dry-run; mwmaint1002] foreachwiki extensions/AbuseFilter/maintenance/fixOldLogEntries.php --dry-run --verbose ([[phab:T228655|T228655]])
* 12:53 moritzm: installing PHP updates on matomo1001/piwik
* 12:28 moritzm: installing PHP 7.0 security updates
* 12:11 Urbanecm: EU SWAT done
* 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|728d739}}: Configure logo for ngwikimedia ([[phab:T242416|T242416]]) (duration: 01m 04s)
* 12:05 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|64240e1}}: Add logos for ngwikimedia ([[phab:T242416|T242416]]) (duration: 01m 04s)
* 11:19 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
* 11:08 moritzm: installing boost update from Buster point release
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10468 and previous config saved to /var/cache/conftool/dbconfig/20200220-105117-marostegui.json
* 10:12 Reedy: created $wikidb.blobs_cluster27 on es1023 - [[phab:T245720|T245720]]
* 10:08 Reedy: created $wikidb.blobs_cluster26 on es1020 - [[phab:T245720|T245720]]
* 10:08 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 04s)
* 09:42 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 03s)
* 09:27 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 01s)
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10467 and previous config saved to /var/cache/conftool/dbconfig/20200220-091233-marostegui.json
* 09:02 akosiaris: restart etherpad-lite on etherpad1002 [[phab:T244238|T244238]]
* 09:00 marostegui: Restart m1 database master db1135 (etherpad will not be available for around 1 minute) - [[phab:T244238|T244238]]
* 08:40 jynus: disable puppet and stop bacula service [[phab:T244238|T244238]]
* 08:35 marostegui: Upgrade mysql on db1135 without restart [[phab:T244238|T244238]]
* 07:47 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q15k (was Q10k) ([[phab:T225057|T225057]]) - in case of cache issues (duration: 01m 03s)
* 07:46 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q15k (was Q10k) ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 07:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q10k (was Q8k) ([[phab:T225057|T225057]]) - in case of cache issue (duration: 01m 01s)
* 07:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q10k (was Q8k) ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 07:17 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8000 ([[phab:T225057|T225057]]) - in case of cache issue (duration: 01m 03s)
* 07:15 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8000 ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 07:01 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6000 ([[phab:T225057|T225057]]) - extra sync for cache issue (duration: 01m 04s)
* 07:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6000 ([[phab:T225057|T225057]]) (duration: 01m 06s)
* 06:46 vgutierrez: test trafficserver 8.0.6-rc1 in cp30[64,65]
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10466 and previous config saved to /var/cache/conftool/dbconfig/20200220-062445-marostegui.json
* 06:17 marostegui: Repool labsdb1011
* 06:12 marostegui: Remove partitions from db1101:3318 - [[phab:T239453|T239453]]
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 to remove revision partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10465 and previous config saved to /var/cache/conftool/dbconfig/20200220-061213-marostegui.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 this host already had the partitions removed - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10464 and previous config saved to /var/cache/conftool/dbconfig/20200220-061019-marostegui.json
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 to remove revision partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10463 and previous config saved to /var/cache/conftool/dbconfig/20200220-060914-marostegui.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 on s8, db1099:3318 back to its original weight', diff saved to https://phabricator.wikimedia.org/P10462 and previous config saved to /var/cache/conftool/dbconfig/20200220-055943-marostegui.json
* 00:22 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571860{{!}}Allow non-autoconfirmed users to propose OAuth apps (T213760)]] (duration: 01m 04s)
* 00:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573397{{!}}Enable password-reset (requireemail pref) on test WD and Commons (T245660)]] (duration: 01m 03s)


== 2020-02-19 ==
== 2022-10-04 ==
* 23:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw138[0-3].eqiad.wmnet
* 23:09 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw137[4-9].eqiad.wmnet
* 22:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1363.eqiad.wmnet
* 21:28 cjming: end of UTC late backport window
* 23:28 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: cirrus: Reduce CirrusSearch-MoreLike cache workers and queue back to normal (duration: 01m 03s)
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:26 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw138[0-3].eqiad.wmnet
* 21:25 cjming@deploy1002: Finished scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] (duration: 05m 06s)
* 23:26 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw137[4-9].eqiad.wmnet
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:25 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1363.eqiad.wmnet
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:23 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: redirect more_like from codfw back to eqiad (duration: 01m 04s)
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:21 cjming@deploy1002: cjming and cjming: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:20 cjming@deploy1002: Started scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]]
* 23:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:57 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c16c63a]: articletopic thresholding for ores scores and eventgate port update (duration: 00m 57s)
* 21:07 cjming@deploy1002: Finished scap: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]] (duration: 05m 40s)
* 22:56 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c16c63a]: articletopic thresholding for ores scores and eventgate port update
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:54 robh: cp3050 & cp3051 returned to service via [[phab:T243167|T243167]]
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgServer to protocol-relative for Wikitech and Test Wikitech (duration: 01m 05s)
* 21:01 cjming@deploy1002: cjming and mdsshakil: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 22:37 robh: taking cp3050 & cp3051 offline for firmware update via [[phab:T243167|T243167]]
* 21:01 cjming@deploy1002: Started scap: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]]
* 22:23 mutante: phabricator - upgrading PHP packages
* 20:59 cjming@deploy1002: Finished scap: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]] (duration: 06m 35s)
* 22:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw231([0-6]).codfw.wmnet
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:12 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw231([0-6]).codfw.wmnet
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:11 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(6[4-9]{{!}}7[0-3]{{!}}84).eqiad.wmnet
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:10 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw13(6[4-9]{{!}}7[0-3]{{!}}84).eqiad.wmnet
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:08 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2314.codfw.wmnet
* 20:53 cjming@deploy1002: cjming and trainbranchbot: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 21:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:52 cjming@deploy1002: Started scap: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]]
* 21:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:54 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:49 cjming@deploy1002: Sync cancelled.
* 21:52 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:48 bblack: all authdns servers - upgrade to gdnsd-3.2.2
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:42 cjming@deploy1002: cjming and aishik: Backport for [[gerrit:838207{{!}}Add wordmark and tagline for Bengali Wikibooks (T319320)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 21:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:41 cjming@deploy1002: Started scap: Backport for [[gerrit:838207{{!}}Add wordmark and tagline for Bengali Wikibooks (T319320)]]
* 21:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:39 cjming@deploy1002: Finished scap: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]] (duration: 14m 29s)
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:25 cjming@deploy1002: cjming and d3r1ck01: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:25 cjming@deploy1002: Started scap: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]]
* 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:54 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS buster
* 21:29 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 18:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:34 mutante: gerrit - deploying puppet refactoring change
* 20:55 eileen: civicrm revision changed from {{Gerrit|52c68911c6}} to {{Gerrit|a6b222c19f}}, config revision is {{Gerrit|561ae21f77}}
* 18:34 tzatziki: removing 1 file for legal compliance
* 20:15 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 06s)
* 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS buster
* 20:13 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:12 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 06s)
* 18:24 tzatziki: removing 1 file for legal compliance
* 20:10 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 05s)
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:05 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.20 (duration: 01m 03s)
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.20
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:02 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 18:21 moritzm: installing gdk-pixbuf security updates
* 20:02 rzl@cumin1001: conftool action : set/weight=10; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 18:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 19:54 rlazarus: scap pull on new api servers mw13[56-62]
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:50 mutante: generating mcrouter certs for new codfw mw appservers
* 18:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:39 mutante: initial puppet run on new hosts mw231*
* 18:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:31 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/skins/MinervaNeue/includes/MinervaHooks.php: [[phab:T245162|T245162]] Check title value before proceeding to check if user page (duration: 01m 04s)
* 17:59 ejegg: turned fundraising scheduled jobs back on
* 19:27 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/skins/MinervaNeue/includes/MinervaHooks.php: [[phab:T245162|T245162]] Check title value before proceeding to check if user page (duration: 01m 04s)
* 17:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:21 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: [[phab:T244577|T244577]] [metawiki] Disable MobileFrontend mainpage special casing (duration: 01m 04s)
* 17:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]] (duration: 06m 58s)
* 19:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244369|T244369]] [trwiki] Enable the WikidataPageBanner extension (duration: 01m 05s)
* 17:55 moritzm: installing libsndfile security updates
* 19:11 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: [[phab:T245570|T245570]] resourceloader: fix SqlDependencyModuleStore::setMulti() to use upsert() (duration: 01m 01s)
* 17:50 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 18:45 bblack: dns4001 - upgraded to gdnsd-3.2.2
* 17:50 urbanecm@deploy1002: Started scap: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]]
* 18:44 bblack: reprepro: upload gdnsd 3.2.2-1~wmf1 to buster-wikimedia
* 17:49 ejegg: turned off fundraising scheduled jobs for civi deploy
* 18:39 mutante: mwmaint1002 - sudo systemctl reset-failed to clear systemd alerts
* 17:28 tzatziki: removing 4 files for legal compliance
* 18:38 mutante: mwmaint1002 - removing Icinga ACK for systemd state - comments for it were from HHVM removal in Oct 2019
* 17:04 mutante: gerrit - deployed 832345 - scap and daemon users became decoupled ([[phab:T317412|T317412]])
* 18:26 mutante: phab2001 - upgraded ssh-server, kept locally modified config; apt autoremove removes python3-debconf
* 17:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:23 mutante: phab2001 - installing package upgrades, incl. openssh, PHP version
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:22 mutante: phab2001 - upgrading mariadb client package versions
* 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:19 mutante: removing problem ACK from Icinga alerts for wikitech-static MediaWiki version. comments were about things in 2019
* 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:48 robh: cp1089 cp1090 returned to service via [[phab:T243167|T243167]]
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:40 jynus: starting data check between db1078 and db1140:3313 [[phab:T244958|T244958]]
* 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q4000 ([[phab:T225057|T225057]]) (just incase of cache issue) (duration: 01m 04s)
* 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q4000 ([[phab:T225057|T225057]]) (duration: 01m 01s)
* 16:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:14 ema: cp4026: repool after probe Connection:keep-alive experiment revert https://gerrit.wikimedia.org/r/573337
* 16:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:12 robh: cp1088 returned to service, cp1089 & cp1090 offline for firmware update via [[phab:T243167|T243167]]
* 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:44 papaul: replacing ps1-a8-codfw mgmt in rack A8 will go down
* 16:25 brennen@deploy1002: Pruned MediaWiki: 1.40.0-wmf.2 (duration: 02m 02s)
* 16:37 otto@deploy1001: Finished deploy [analytics/refinery@e23918a]: Updating eventgate-analytics port ([[phab:T245203|T245203]]) and also eventlogging whitelist (duration: 12m 27s)
* 16:24 brennen@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]] (duration: 28m 55s)
* 16:32 ema: depool cp4026, 5xx
* 16:21 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns4003.wikimedia.org with OS bullseye
* 16:24 otto@deploy1001: Started deploy [analytics/refinery@e23918a]: Updating eventgate-analytics port ([[phab:T245203|T245203]]) and also eventlogging whitelist
* 16:03 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 16:13 marostegui: Depool labsdb1011 to help replication to catch up
* 16:00 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 16:05 elukey: Update analytics-in4 filter term eventgate for [[phab:T245203|T245203]] on cr1/cr2 eqiad
* 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:48 ariel@deploy1001: Finished deploy [dumps/dumps@b42acb5]: fix temp stub generation, add pagerangeinfo cache, some unit tests (duration: 00m 03s)
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:48 ariel@deploy1001: Started deploy [dumps/dumps@b42acb5]: fix temp stub generation, add pagerangeinfo cache, some unit tests
* 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:59 marostegui: Stop mysql on es2021 - [[phab:T243052|T243052]]
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS buster
* 14:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 15:54 brennen@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 14:29 marostegui: Data checksum on db1084 [[phab:T245621|T245621]]
* 15:53 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 14:07 marostegui: Upgrade and reboot db1084 - [[phab:T245621|T245621]]
* 15:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 14:02 marostegui: Start mysql on db1084 without replication - [[phab:T245621|T245621]]
* 15:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 13:53 jbond42: disable puppet to upgrade postgresql
* 15:51 brennen: restarting `/usr/bin/scap stage-train --yes auto` after failed staging ([[phab:T314193|T314193]]), cc: ^demon
* 13:30 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1084, lots of connection errors', diff saved to https://phabricator.wikimedia.org/P10458 and previous config saved to /var/cache/conftool/dbconfig/20200219-133057-jynus.json
* 15:48 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 12:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573236{{!}}Start reading for the new term store for clients up to Q2000 (T225057)]], take II, the cache issue (duration: 01m 04s)
* 15:47 sukhe: disable Puppet on A:cp and A:eqiad for [[phab:T309651|T309651]]
* 12:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573236{{!}}Start reading for the new term store for clients up to Q2000 (T225057)]] (duration: 01m 06s)
* 15:42 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 11:56 volans: better splay of periodic scripts that interact with Netbox - [[phab:T244291|T244291]]
* 15:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
* 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:08 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib/includes/Store: Get rid of useless metrics in EntityTermLookupBase ([[phab:T245592|T245592]]) (duration: 01m 04s)
* 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 11:06 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib/includes/Store: Get rid of useless metrics in EntityTermLookupBase ([[phab:T245592|T245592]]) (duration: 01m 12s)
* 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS buster
* 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:58 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:58 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 15:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
* 10:45 jynus: upgrading mariadb client on cumin hosts
* 15:10 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3315, db2089:3316 after new package testing', diff saved to https://phabricator.wikimedia.org/P10457 and previous config saved to /var/cache/conftool/dbconfig/20200219-103806-marostegui.json
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2002.codfw.wmnet with OS buster
* 10:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 15:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 10:17 jynus: stopping db2089 mariadb@s5
* 15:06 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 10:12 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw135[0-5]*.eqiad.wmnet
* 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:12 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw135[0-5]*.eqiad.wmnet
* 15:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:11 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1349.eqiad.wmnet
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:11 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw1349.eqiad.wmnet
* 15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:09 moritzm: updated tftpboot environment for stretch-bootif for the 9.12 point release [[phab:T241359|T241359]]
* 15:02 moritzm: installing snakeyaml security updates
* 09:53 jynus: stopping and upgrading db1140 instances
* 14:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3315, db2089:3316 for new package testing', diff saved to https://phabricator.wikimedia.org/P10455 and previous config saved to /var/cache/conftool/dbconfig/20200219-095139-marostegui.json
* 14:55 papaul: maintenance complete on msw1-codfw
* 09:51 marostegui: Depool db2089:3315, db2089:3316 for new package testing
* 14:51 sukhe: disable Puppet on A:cp and A:esams for [[phab:T309651|T309651]]
* 09:49 akosiaris: [[phab:T245516|T245516]]. Deploy mathoid chart version 0.0.27, removing logstash gelf configuration
* 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
* 09:46 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 14:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
* 09:43 vgutierrez: test trafficserver 8.0.6-rc1 in cp40[26,32]
* 14:40 moritzm: installing maven-shared-utils security updates
* 09:34 _joe_: cleared opcache on mw1313
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2002.codfw.wmnet with OS buster
* 09:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
* 09:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 14:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
* 09:33 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 14:30 papaul: on going maintenance on msw1-codfw
* 08:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:29 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 14:27 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 08:50 marostegui: Remove dbproxy1007 grants from m2 - [[phab:T231280|T231280]]
* 14:22 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 08:41 marostegui: Remove wikiadmin2 user from s7 - [[phab:T243512|T243512]]
* 14:14 XioNoX: netbox - Move VRRP IPs to FHRP group feature - [[phab:T311218|T311218]]
* 08:23 Urbanecm: run mwscript deleteEqualMessages.php cswiki --delete
* 14:13 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 08:14 godog: roll restart swift proxies - [[phab:T244776|T244776]]
* 14:12 filippo@cumin1001: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 07:02 marostegui: Remove wikiadmin2 user from es2 - [[phab:T243512|T243512]]
* 14:12 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/tests/phpunit/: Backport: [[gerrit:838094{{!}}Revert "Introduce LanguageVariantConverter" (T319282)]] (2/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 52s)
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 50 -> 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10454 and previous config saved to /var/cache/conftool/dbconfig/20200219-065726-marostegui.json
* 14:12 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 06:35 marostegui: Compress watchlist_expiry table on s3 (this will take hours as I have left a 60 seconds sleep between tables) - [[phab:T245358|T245358]]
* 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/includes/: Backport: [[gerrit:838094{{!}}Revert "Introduce LanguageVariantConverter" (T319282)]] (1/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 43s)
* 06:17 marostegui: Compress new and empty watchlist_expiry table - [[phab:T245358|T245358]]
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1353.eqiad.wmnet
* 14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/extensions/Kartographer/modules/dialog: Backport: [[gerrit:838097{{!}}Log basic nearby and fullscreen events (T315972, T318678)]] (no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 42s)
* 01:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 01:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1354.eqiad.wmnet
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:22 mutante: mw1353 - restarted apache (some race condition on new installs, 5 other servers did not have the issue)
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1350.eqiad.wmnet
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1351.eqiad.wmnet
* 13:55 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 01:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1352.eqiad.wmnet
* 13:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1355.eqiad.wmnet
* 13:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1354.eqiad.wmnet
* 13:49 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1350.eqiad.wmnet
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35347 and previous config saved to /var/cache/conftool/dbconfig/20221004-134947-root.json
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1353.eqiad.wmnet
* 13:49 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1351.eqiad.wmnet
* 13:48 sukhe: disable Puppet on A:cp and A:eqsin for [[phab:T309651|T309651]]
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1352.eqiad.wmnet
* 13:47 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 01:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:42 awight: EU backport window finished.
* 01:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T240728|T240728]] Fix Latin Wikipedia (VICIPÆDIA) wordmark and set size correctly (duration: 01m 06s)
* 13:40 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 01:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:38 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 00:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
* 00:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:43 James_F: Manually purged https://en.wikipedia.org/images/mobile/copyright/wikipedia-wordmark-la.svg and .png from Varnish for [[phab:T240728|T240728]]
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:41 jforrester@deploy1001: Synchronized static/images/mobile/copyright/: [[phab:T240728|T240728]] Sync logo images (duration: 01m 04s)
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:40 mutante: mw1351 through mw1355 - initial puppet runs - new appservers
* 13:36 awight@deploy1002: Finished scap: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]] (duration: 06m 49s)
* 00:36 niharika29@deploy1001: Synchronized static/images/mobile/copyright/: Remove unnecessary id from wordmark (duration: 01m 03s)
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:34 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adjust MT Threshold for Assamese to 70% - [[phab:T245509|T245509]] (duration: 01m 04s)
* 13:35 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
* 00:24 niharika29@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/WikimediaEvents/: Follow up on authevents statsd changes in {{Gerrit|I7612b68fe}} (duration: 01m 03s)
* 13:35 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "filippo test - filippo@cumin1001"
* 00:21 niharika29@deploy1001: Synchronized wmf-config/logging.php: Update authmanager-statsd channel name (duration: 01m 03s)
* 13:34 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "filippo test - filippo@cumin1001"
* 00:16 eileen: civicrm revision changed from {{Gerrit|8c77e9e915}} to {{Gerrit|52c68911c6}}, config revision is {{Gerrit|561ae21f77}}
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35346 and previous config saved to /var/cache/conftool/dbconfig/20221004-133442-root.json
* 00:10 niharika29@deploy1001: Synchronized wmf-config/logging.php: Make the logstash and authmanager-statsd Monolog handlers compatible (duration: 01m 04s)
* 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
* 00:08 mutante: creating mcrouter certs for mw1350
* 13:31 jbond: re-enable puppet post deploy a puppetmaster change 838144
* 13:30 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
* 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
* 13:30 awight@deploy1002: awight and awight: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:29 awight@deploy1002: Started scap: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]]
* 13:28 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
* 13:27 awight@deploy1002: Finished scap: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]] (duration: 05m 16s)
* 13:24 jbond: disable puppet to deploy a puppetmaster change 838144
* 13:22 awight@deploy1002: awight and stang: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:21 awight@deploy1002: Started scap: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]]
* 13:21 awight@deploy1002: Finished scap: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]] (duration: 12m 48s)
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35345 and previous config saved to /var/cache/conftool/dbconfig/20221004-131937-root.json
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 awight@deploy1002: awight and stang: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:08 awight@deploy1002: Started scap: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]]
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35343 and previous config saved to /var/cache/conftool/dbconfig/20221004-130432-root.json
* 12:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35342 and previous config saved to /var/cache/conftool/dbconfig/20221004-124927-root.json
* 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35341 and previous config saved to /var/cache/conftool/dbconfig/20221004-123422-root.json
* 12:31 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]] (duration: 00m 58s)
* 12:30 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]]
* 12:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 12:26 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]] (duration: 00m 14s)
* 12:26 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]]
* 12:21 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35340 and previous config saved to /var/cache/conftool/dbconfig/20221004-121917-root.json
* 12:14 volans: uploaded python3-gjson_0.1.0 to apt.wikimedia.org bullseye-wikimedia
* 12:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 12:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 12:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 12:08 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host sessionstore2001.codfw.wmnet with OS buster
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35339 and previous config saved to /var/cache/conftool/dbconfig/20221004-120413-root.json
* 11:55 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 11:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
* 11:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
* 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 11:22 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 11:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 11:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 11:05 jayme: published calico 3.23.3 debian packages in bullseye component/calico323 as well as corresponding docker images - [[phab:T307943|T307943]]
* 11:04 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:58 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
* 10:55 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:54 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS buster
* 10:53 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 10:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 135158
* 10:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 135158
* 10:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9119
* 10:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9119
* 10:41 moritzm: installing expat security updates
* 09:59 jmm@cumin2002: END (FAIL) - Cookbook sre.maps.roll-restart (exit_code=1) rolling restart_daemons on A:maps-codfw
* 09:47 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 09:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 09:44 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 09:42 jayme: deployed istio-ingressgateway with additional envoy native metrics to wikikube codfw and eqiad
* 09:40 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
* 09:37 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-codfw
* 09:36 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
* 09:36 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
* 09:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 20 hosts
* 09:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 20 hosts
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35338 and previous config saved to /var/cache/conftool/dbconfig/20221004-093530-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35337 and previous config saved to /var/cache/conftool/dbconfig/20221004-092025-root.json
* 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35336 and previous config saved to /var/cache/conftool/dbconfig/20221004-090520-root.json
* 08:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: php7.2 removal
* 08:55 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: php7.2 removal
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35335 and previous config saved to /var/cache/conftool/dbconfig/20221004-085015-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35334 and previous config saved to /var/cache/conftool/dbconfig/20221004-083511-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35333 and previous config saved to /var/cache/conftool/dbconfig/20221004-082005-root.json
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35332 and previous config saved to /var/cache/conftool/dbconfig/20221004-080500-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P35331 and previous config saved to /var/cache/conftool/dbconfig/20221004-080338-root.json
* 07:52 moritzm: installing libdatetime-timezone-perl updates (catching up with latest timezone changes)
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35330 and previous config saved to /var/cache/conftool/dbconfig/20221004-074955-root.json
* 07:36 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
* 07:36 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35329 and previous config saved to /var/cache/conftool/dbconfig/20221004-072158-root.json
* 07:16 elukey: restart kafka on kafka-logging1001 to pick up its new PKI TLS cert
* 07:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
* 07:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35328 and previous config saved to /var/cache/conftool/dbconfig/20221004-070653-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35327 and previous config saved to /var/cache/conftool/dbconfig/20221004-065148-root.json
* 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35326 and previous config saved to /var/cache/conftool/dbconfig/20221004-063643-root.json
* 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 25885
* 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 25885
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35325 and previous config saved to /var/cache/conftool/dbconfig/20221004-062138-root.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35324 and previous config saved to /var/cache/conftool/dbconfig/20221004-060633-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35323 and previous config saved to /var/cache/conftool/dbconfig/20221004-055128-root.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35322 and previous config saved to /var/cache/conftool/dbconfig/20221004-053623-root.json
* 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-02-18 ==
== 2022-10-03 ==
* 23:56 mutante: mw1349 - scap pull
* 21:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1349.eqiad.wmnet
* 21:44 robh@cumin2002: START - Cookbook sre.dns.netbox
* 23:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1349.eqiad.wmnet
* 21:44 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 23:34 maryum: running reindex on mwmaint1002 - [[phab:T194448|T194448]]
* 21:18 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 23:28 maryum: running reindex for wikimedia wikis
* 19:41 ryankemper: [Elastic] Unbanned `elastic1066`
* 23:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:37 ryankemper: [Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running
* 23:12 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2151.wmnet
* 19:32 robh: msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work
* 23:12 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2150.wmnet
* 19:25 robh: msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet
* 23:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:22 ryankemper: [Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": "","_name": "elastic1066-production-search-psi-eqiad"}'`); will restart elasticsearch-psi after shards drain}}
* 22:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Enable ores_articletopics field creation for all wikis (extra sync for [[phab:T236104|T236104]]) (duration: 01m 04s)
* 19:15 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 22:54 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Enable ores_articletopics field creation for all wikis (duration: 01m 03s)
* 18:48 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 22:52 chaomodus: completed upgrading Netbox to 2.7.4 [[phab:T244291|T244291]]
* 18:41 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 22:51 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part3) (duration: 00m 11s)
* 18:34 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 22:51 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part3)
* 18:30 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 22:49 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part2) (duration: 01m 19s)
* 18:30 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster
* 22:48 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part2)
* 18:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (duration: 01m 19s)
* 18:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 22:45 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]]
* 18:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 22:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244185|T244185]] Raise minimum log level for 'OAuth' from DEBUG to INFO (duration: 01m 04s)
* 18:04 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 22:30 chaomodus: Upgrading Netbox to 2.7.4
* 18:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 21:56 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:52 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 21:54 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 17:42 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 21:26 XioNoX: rollback tcp-mss clamping in eqiad/eqord
* 17:41 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4003
* 21:07 jeh: power down and set incinga downtime on cloudvirt1022 [[phab:T243536|T243536]]
* 17:41 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns4003
* 21:07 jeh: power down and set incinga downtime on cloudvirt1022 [[phab:T241884|T241884]]
* 17:40 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:54 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventStreamConfig extension on metawiki - [[phab:T242122|T242122]] (duration: 01m 03s)
* 17:37 robh@cumin2002: START - Cookbook sre.dns.netbox
* 20:47 ppchelko@deploy1001: Finished deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group [[phab:T244387|T244387]] (duration: 07m 59s)
* 17:29 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
* 20:45 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventStreamConfig extension on testwiki - [[phab:T242122|T242122]] (duration: 01m 04s)
* 17:29 sukhe: running homer "cr*-ulsfo*" commit "Gerrit 837727: remove dns4001 for anycast neighbors."
* 20:39 ppchelko@deploy1001: Started deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group [[phab:T244387|T244387]]
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns4001.wikimedia.org
* 20:06 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/libs/StatusValue.php: [[phab:T245155|T245155]] StatusValue: Fix __toString() to not choke on special parameters (duration: 01m 04s)
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.20 [[phab:T233868|T233868]]
* 17:08 robh@cumin2002: START - Cookbook sre.dns.netbox
* 19:52 jforrester@deploy1001: Finished scap: testwiki to 1.35.0-wmf.20 and re-build l10n cache [[phab:T233868|T233868]] (duration: 61m 01s)
* 17:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns4001.wikimedia.org
* 19:41 papaul: shutting down dns2001 for 10G card troubleshooting
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:30 James_F: Running `foreachwiki sql.php php-1.35.0-wmf.19/maintenance/archives/patch-watchlist_expiry.sql` for [[phab:T244631|T244631]]
* 16:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:51 jforrester@deploy1001: Started scap: testwiki to 1.35.0-wmf.20 and re-build l10n cache [[phab:T233868|T233868]]
* 16:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:49 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.18 (duration: 15m 29s)
* 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:25 James_F: Running `scap prep` for 1.35.0-wmf.20 ref. [[phab:T233868|T233868]]
* 16:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30781
* 18:01 James_F: 1.35.0-wmf.20 was branched at {{Gerrit|c664b4f1b933d110bd69f074c399695bd6b17d13}} for [[phab:T233868|T233868]]
* 16:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 30781
* 18:01 marxarelli: completed promotion of 1.35.0-wmf.19 to all wikis ([[phab:T233867|T233867]])
* 16:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:52 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Re-roll all wikis to 1.35.0-wmf.19 ([[phab:T233867|T233867]])
* 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:47 marxarelli: re-rolling wmf.19 to all wikis ([[phab:T233867|T233867]]) with eyes particularly on ([[phab:T245202|T245202]])
* 16:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:28 bblack: cp3 (esams edge) - revert GRE MTU mitigations - [[phab:T232602|T232602]]
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:00 papaul: restting ps1-a8-codfw see [[phab:T245164|T245164]]
* 16:24 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] (duration: 04m 16s)
* 16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:12 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:11 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:20 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 16:08 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 16:20 urbanecm@deploy1002: Started scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]]
* 16:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cae49b85d2d780e34b553789d56d76bac4a62c48}}: throttle: Add throttle rule for 2022-10-06 ([[phab:T319212|T319212]]) (duration: 04m 21s)
* 16:02 ottomata: deploying new 'canary' and 'production' releases for eventgate-main. (These releases use a new nodePort, and so will not be active until LVS is modified. The old 'main' release and nodePort is left as is.) - [[phab:T242861|T242861]]
* 16:14 sukhe: disable Puppet on cp hosts in codfw: rolling out [[phab:T309651|T309651]]
* 16:02 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 15:15 sukhe: disable Puppet on cp hosts in ulsfo: rolling out [[phab:T309651|T309651]]
* 15:51 bblack: dns2001 - shutdown for hw/reimage work - [[phab:T242017|T242017]]
* 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35320 and previous config saved to /var/cache/conftool/dbconfig/20221003-151438-root.json
* 15:47 bblack: dns2001 - stopping bgp to drain service for hw/reimage work - [[phab:T242017|T242017]]
* 15:06 papaul: maintenance complete on mr1-esams
* 15:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35319 and previous config saved to /var/cache/conftool/dbconfig/20221003-145933-root.json
* 15:40 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35318 and previous config saved to /var/cache/conftool/dbconfig/20221003-144428-root.json
* 15:36 jynus: stopping db1140:s3 instance
* 14:35 sukhe: upgrade A:cp and A:drmrs to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 15:35 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:31 papaul: on going maintenance on mr1-esams
* 15:34 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35317 and previous config saved to /var/cache/conftool/dbconfig/20221003-142923-root.json
* 15:34 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35316 and previous config saved to /var/cache/conftool/dbconfig/20221003-141417-root.json
* 15:14 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:08 sukhe: upgrade cp4026, cp4032 to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 15:08 vgutierrez@puppetmaster1001: conftool action : set/weight=100; selector: dc=eqiad,cluster=cache_text,service=ats-be,name=cp1089.eqiad.wmnet
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35315 and previous config saved to /var/cache/conftool/dbconfig/20221003-135912-root.json
* 15:04 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 13:57 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm2_amd64.changes: [[phab:T309651|T309651]]
* 14:56 bblack: esams repooled in DNS
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35314 and previous config saved to /var/cache/conftool/dbconfig/20221003-134407-root.json
* 14:54 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35313 and previous config saved to /var/cache/conftool/dbconfig/20221003-134024-root.json
* 14:54 ottomata: deploying new 'canary' and 'production' releases for eventgate-analytics.  (These releases use a new nodePort, and so will not be active until LVS is modified.  The old 'analytics' release and nodePort is left as is.) - [[phab:T242861|T242861]]
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35312 and previous config saved to /var/cache/conftool/dbconfig/20221003-132902-root.json
* 14:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35311 and previous config saved to /var/cache/conftool/dbconfig/20221003-132519-root.json
* 14:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 13:18 vgutierrez: enforcing origin-form{{!}}asterisk-form for request-target on varnish (could trigger spikes of HTTP 400 errors) - [[phab:T318676|T318676]]
* 14:39 XioNoX: remove cr2-esams VRRP handicap - [[phab:T243080|T243080]]
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35310 and previous config saved to /var/cache/conftool/dbconfig/20221003-131014-root.json
* 14:34 XioNoX: restore default esams-eqiad link cost - [[phab:T243080|T243080]]
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35308 and previous config saved to /var/cache/conftool/dbconfig/20221003-125509-root.json
* 14:33 XioNoX: re-enable cr2-esams BGP transit/peering - [[phab:T243080|T243080]]
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35307 and previous config saved to /var/cache/conftool/dbconfig/20221003-124004-root.json
* 14:31 XioNoX: cr2-esams - request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35306 and previous config saved to /var/cache/conftool/dbconfig/20221003-122459-root.json
* 14:29 XioNoX: re-disable cr2-esams BGP group IX4 - [[phab:T243080|T243080]]
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35305 and previous config saved to /var/cache/conftool/dbconfig/20221003-120954-root.json
* 14:14 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/DiscussionTools: [[gerrit:572882{{!}}wmf.18: Add config option and query parameter to control loading]] (duration: 01m 11s)
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P35303 and previous config saved to /var/cache/conftool/dbconfig/20221003-120208-root.json
* 14:02 cdanis: depool esams
* 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 14:01 XioNoX: re-enable cr2-esams BGP group IX4 - [[phab:T243080|T243080]]
* 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 25 -> 50 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10448 and previous config saved to /var/cache/conftool/dbconfig/20200218-135525-marostegui.json
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 13:44 XioNoX: installing OS on cr2-esams:re0 - [[phab:T243080|T243080]]
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 13:39 XioNoX: cr2-esams - request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35302 and previous config saved to /var/cache/conftool/dbconfig/20221003-115449-root.json
* 13:37 XioNoX: deactivate peering/transit on cr2-esams - [[phab:T243080|T243080]]
* 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 13:24 XioNoX: reboot cr2-esams:re1 (backup) - [[phab:T243080|T243080]]
* 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 13:23 XioNoX: bump cost of eqiad-esams transport - [[phab:T243080|T243080]]
* 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 13:10 XioNoX: fail vrrp master to cr3-esams - [[phab:T243080|T243080]]
* 11:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 12:58 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1003.eqiad.wmnet with OS buster
* 12:55 Amir1: EU SWAT done
* 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 12:53 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572731{{!}}Add DiscussionTools to four wikis in hidden mode (T244870)]], take II (duration: 01m 03s)
* 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 12:52 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572731{{!}}Add DiscussionTools to four wikis in hidden mode (T244870)]] (duration: 01m 04s)
* 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 12:45 XioNoX: remove graceful-switchover and nonstop-routing from cr2-esams - [[phab:T243080|T243080]]
* 11:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 12:36 XioNoX: push new Junos to cr2-esams:re1 (backup RE, noop) - [[phab:T243080|T243080]]
* 10:52 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1003.eqiad.wmnet with OS buster
* 12:22 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part II (duration: 01m 03s)
* 10:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part I, take II (the cache issue) (duration: 01m 04s)
* 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 12:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part I (duration: 01m 06s)
* 10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 12:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572628{{!}}Start reading for the new term store for clients up to Q1000 (T225057)]] (duration: 01m 05s)
* 10:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1002.eqiad.wmnet with OS buster
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4b193dd}}: Increase Commons linkpurge rate limit for patrollers ([[phab:T245214|T245214]]) (duration: 01m 31s)
* 10:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 11:51 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 10:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 11:48 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 10:39 hnowlan: starting cassandra on reimaged sessionstore1002
* 11:47 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 10:37 _joe_: remove stale druid.svc.eqiad.wmnet certificate from the puppetmaster CA; it was expired anyways
* 11:43 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 10:32 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 11:41 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 10:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 11:35 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 10:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 11:27 jynus: reenabling prometheus exporter metadata user for prometheus1003
* 10:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 11:10 jynus: temp. disabling prometheus exporter metadata user for prometheus1003
* 10:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 15 -> 25 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10445 and previous config saved to /var/cache/conftool/dbconfig/20200218-104958-marostegui.json
* 10:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1002.eqiad.wmnet with OS buster
* 09:27 gehel: re-enable puppet on mw* - [[phab:T222321|T222321]]
* 10:00 hnowlan: c-foreach-nt drain on sessionstore1002
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10444 and previous config saved to /var/cache/conftool/dbconfig/20200218-091343-marostegui.json
* 10:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 09:09 gehel: disabling puppet on mw* to deploy apache config change - [[phab:T222321|T222321]]
* 10:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 09:07 volans: rm /var/log/exim4/paniclog on cumin1001 to clear OOM from last week error
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35300 and previous config saved to /var/cache/conftool/dbconfig/20221003-092519-root.json
* 08:59 marostegui: Remove wikiadmin2 grants from es1 [[phab:T243512|T243512]]
* 09:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31133
* 08:59 marostegui: Remove wikiadmin2 grants from es1
* 09:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31133
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options', diff saved to https://phabricator.wikimedia.org/P10443 and previous config saved to /var/cache/conftool/dbconfig/20200218-085713-marostegui.json
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62044
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10442 and previous config saved to /var/cache/conftool/dbconfig/20200218-082306-marostegui.json
* 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62044
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10441 and previous config saved to /var/cache/conftool/dbconfig/20200218-080952-marostegui.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35299 and previous config saved to /var/cache/conftool/dbconfig/20221003-091014-root.json
* 08:08 marostegui: Restart MySQL to pick up optimizer_switch changes - [[phab:T245489|T245489]]
* 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10440 and previous config saved to /var/cache/conftool/dbconfig/20200218-080623-marostegui.json
* 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 07:34 elukey: powercycle analytics1065 (crashed hours ago, no mgmt console available, no ssh)
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P35297 and previous config saved to /var/cache/conftool/dbconfig/20221003-085840-root.json
* 06:39 marostegui: Remove wikiadmin2 from pc1007, pc1008, pc1009 and pc1010 [[phab:T243512|T243512]]
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35296 and previous config saved to /var/cache/conftool/dbconfig/20221003-085509-root.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1107 100 -> 200 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10439 and previous config saved to /var/cache/conftool/dbconfig/20200218-063819-marostegui.json
* 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12975
* 06:27 marostegui: Stop haproxy on dbproxy1007 - [[phab:T245385|T245385]]
* 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12975
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 100 and weight 10 in API for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10438 and previous config saved to /var/cache/conftool/dbconfig/20200218-062459-marostegui.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35295 and previous config saved to /var/cache/conftool/dbconfig/20221003-085007-root.json
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:40 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5001.eqsin.wmnet
* 06:08 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 08:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35294 and previous config saved to /var/cache/conftool/dbconfig/20221003-084004-root.json
* 08:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3303
* 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35293 and previous config saved to /var/cache/conftool/dbconfig/20221003-083729-root.json
* 08:36 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
* 08:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35292 and previous config saved to /var/cache/conftool/dbconfig/20221003-083502-root.json
* 08:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5001.eqsin.wmnet
* 08:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15557
* 08:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15557
* 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12975
* 08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12975
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35291 and previous config saved to /var/cache/conftool/dbconfig/20221003-082459-root.json
* 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30781
* 08:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30781
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35290 and previous config saved to /var/cache/conftool/dbconfig/20221003-082224-root.json
* 08:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39386
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35289 and previous config saved to /var/cache/conftool/dbconfig/20221003-081955-root.json
* 08:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39386
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35288 and previous config saved to /var/cache/conftool/dbconfig/20221003-080954-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35287 and previous config saved to /var/cache/conftool/dbconfig/20221003-080719-root.json
* 08:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35286 and previous config saved to /var/cache/conftool/dbconfig/20221003-080556-root.json
* 08:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35285 and previous config saved to /var/cache/conftool/dbconfig/20221003-080451-root.json
* 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2178', diff saved to https://phabricator.wikimedia.org/P35284 and previous config saved to /var/cache/conftool/dbconfig/20221003-075643-root.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35283 and previous config saved to /var/cache/conftool/dbconfig/20221003-075449-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35282 and previous config saved to /var/cache/conftool/dbconfig/20221003-075214-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35281 and previous config saved to /var/cache/conftool/dbconfig/20221003-075051-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35280 and previous config saved to /var/cache/conftool/dbconfig/20221003-074946-root.json
* 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16637
* 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16637
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35279 and previous config saved to /var/cache/conftool/dbconfig/20221003-073944-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35278 and previous config saved to /var/cache/conftool/dbconfig/20221003-073709-root.json
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 XioNoX: cr2-drmrs# set chassis fpc 0 sampling-instance pmacct
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35277 and previous config saved to /var/cache/conftool/dbconfig/20221003-073627-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1200', diff saved to https://phabricator.wikimedia.org/P35276 and previous config saved to /var/cache/conftool/dbconfig/20221003-073556-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35275 and previous config saved to /var/cache/conftool/dbconfig/20221003-073546-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35274 and previous config saved to /var/cache/conftool/dbconfig/20221003-073441-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35273 and previous config saved to /var/cache/conftool/dbconfig/20221003-072741-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35272 and previous config saved to /var/cache/conftool/dbconfig/20221003-072204-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35271 and previous config saved to /var/cache/conftool/dbconfig/20221003-072122-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35270 and previous config saved to /var/cache/conftool/dbconfig/20221003-072041-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35269 and previous config saved to /var/cache/conftool/dbconfig/20221003-071936-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35268 and previous config saved to /var/cache/conftool/dbconfig/20221003-071236-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35267 and previous config saved to /var/cache/conftool/dbconfig/20221003-070659-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35266 and previous config saved to /var/cache/conftool/dbconfig/20221003-070617-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35265 and previous config saved to /var/cache/conftool/dbconfig/20221003-070536-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35264 and previous config saved to /var/cache/conftool/dbconfig/20221003-070431-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2175', diff saved to https://phabricator.wikimedia.org/P35263 and previous config saved to /var/cache/conftool/dbconfig/20221003-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35262 and previous config saved to /var/cache/conftool/dbconfig/20221003-065731-root.json
* 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6128
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35261 and previous config saved to /var/cache/conftool/dbconfig/20221003-065154-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35260 and previous config saved to /var/cache/conftool/dbconfig/20221003-065112-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35259 and previous config saved to /var/cache/conftool/dbconfig/20221003-065031-root.json
* 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6128
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P35258 and previous config saved to /var/cache/conftool/dbconfig/20221003-064638-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35257 and previous config saved to /var/cache/conftool/dbconfig/20221003-064226-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35256 and previous config saved to /var/cache/conftool/dbconfig/20221003-063607-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35255 and previous config saved to /var/cache/conftool/dbconfig/20221003-063527-root.json
* 06:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11039
* 06:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11039
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35254 and previous config saved to /var/cache/conftool/dbconfig/20221003-062721-root.json
* 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5400
* 06:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5400
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35253 and previous config saved to /var/cache/conftool/dbconfig/20221003-062102-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35252 and previous config saved to /var/cache/conftool/dbconfig/20221003-062022-root.json
* 06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3300
* 06:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3300
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35251 and previous config saved to /var/cache/conftool/dbconfig/20221003-061216-root.json
* 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35250 and previous config saved to /var/cache/conftool/dbconfig/20221003-060557-root.json
* 06:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35249 and previous config saved to /var/cache/conftool/dbconfig/20221003-055711-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P35248 and previous config saved to /var/cache/conftool/dbconfig/20221003-055401-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35247 and previous config saved to /var/cache/conftool/dbconfig/20221003-055052-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P35246 and previous config saved to /var/cache/conftool/dbconfig/20221003-054245-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35245 and previous config saved to /var/cache/conftool/dbconfig/20221003-054206-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P35244 and previous config saved to /var/cache/conftool/dbconfig/20221003-052927-root.json


== 2020-02-17 ==
== 2022-10-02 ==
* 19:56 cdanis: finish enabling TCP-MSS clamping in eqiad
* 08:13 elukey: `apt-get clean` on an-airflow1001 to free some space on the root partition
* 19:49 cdanis: s/no-op//
* 19:49 cdanis: no-op enable TCP-MSS clamping on eqord and eqiad
* 19:33 cdanis: no-op enable flowspec change on cr2-eqord and cr2-eqiad
* 18:25 elukey: restart kafka on kafka-jumbo1001 to pick up new openjdk updates
* 17:25 bblack: GRE MTU mitigations applied to esams cp hosts only - [[phab:T232602|T232602]]
* 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:50 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 15:48 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:44 cdanis: ✔️ cdanis@icinga1001.wikimedia.org ~ 🕥☕ sudo systemctl restart ircecho
* 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10422 and previous config saved to /var/cache/conftool/dbconfig/20200217-143146-marostegui.json
* 14:17 ema: reprepro includedeb buster-wikimedia ~ema/cadvisor_0.35.0+ds1-4_amd64.deb [[phab:T183146|T183146]]
* 12:34 XioNoX: add test flowspec rules to cr3-knams
* 12:34 moritzm: installing postgresql-9.4 security updates
* 12:27 vgutierrez: reboot acmechief instances (kernel upgrade)
* 10:31 jynus: dropping all databases from db1140:3313
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): ' db1107 increase API weight from 10 to 15 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10420 and previous config saved to /var/cache/conftool/dbconfig/20200217-102218-marostegui.json
* 10:20 vgutierrez: rolling restart of ats-tls and varnish-fe on ulsfo to enable KA between them - [[phab:T244464|T244464]]
* 10:00 moritzm: installing Linux 4.9.210 kernels on stretch systems
* 09:10 godog: correction, +100G
* 09:09 godog: +10G to prometheus/ops fs on prometheus eqiad - [[phab:T245361|T245361]]
* 09:06 godog: +50G to prometheus/ops fs on prometheus eqiad - [[phab:T245361|T245361]]
* 07:22 marostegui: Stop haproxy on dbproxy1002 - [[phab:T245384|T245384]]


== 2020-02-15 ==
== 2022-10-01 ==
* 01:01 cdanis: ✔️ cdanis@an-coord1001.eqiad.wmnet ~ 🕗🍺 sudo systemctl restart hive-server2.service ; sudo systemctl restart hive-metastore.service
* 13:24 fab@deploy1002: Finished deploy [airflow-dags/research@44a1158]: (no justification provided) (duration: 00m 08s)
* 13:24 fab@deploy1002: Started deploy [airflow-dags/research@44a1158]: (no justification provided)
* 13:12 fab@deploy1002: Finished deploy [airflow-dags/research@d6b3e82]: (no justification provided) (duration: 03m 35s)
* 13:08 fab@deploy1002: Started deploy [airflow-dags/research@d6b3e82]: (no justification provided)


== 2020-02-14 ==
== 2022-09-30 ==
* 23:42 XenoRyet: updated civicrm from {{Gerrit|cf86495d44}} to {{Gerrit|8c77e9e915}}
* 23:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 21:01 volker-e@deploy1001: Finished deploy [design/style-guide@1928c00]: Deploy design/style-guide:  (duration: 00m 09s)
* 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 21:01 volker-e@deploy1001: Started deploy [design/style-guide@1928c00]: Deploy design/style-guide:
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35243 and previous config saved to /var/cache/conftool/dbconfig/20220930-232546-ladsgroup.json
* 20:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Prevent some logspam [[phab:T245280|T245280]] (duration: 01m 05s)
* 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35242 and previous config saved to /var/cache/conftool/dbconfig/20220930-231040-ladsgroup.json
* 19:27 XenoRyet: updated civicrm from {{Gerrit|55b2afb6eb}} to {{Gerrit|cf86495d44}}
* 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35241 and previous config saved to /var/cache/conftool/dbconfig/20220930-225534-ladsgroup.json
* 19:10 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase: [[phab:T245062|T245062]] Prevent invalid term languages from cached PrefetchingTermLookup (duration: 01m 09s)
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35240 and previous config saved to /var/cache/conftool/dbconfig/20220930-224027-ladsgroup.json
* 17:37 jforrester@deploy1001: Unlocked for deployment [ALL REPOSITORIES]: Testing [[phab:T245062|T245062]] fix on mwdebug1001 (duration: 03m 05s)
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
* 17:33 jforrester@deploy1001: Locking from deployment [ALL REPOSITORIES]: Testing [[phab:T245062|T245062]] fix on mwdebug1001 (planned duration: 60m 00s)
* 20:54 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
* 16:11 moritzm: installing git-lfs updates from Buster 10.3 point update
* 18:30 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 15:55 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb10u1 to apt.wikimedia.org
* 18:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 15:55 bblack: (log(n))
* 18:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10414 and previous config saved to /var/cache/conftool/dbconfig/20200214-155443-marostegui.json
* 17:43 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 15:52 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb9u1 to apt.wikimedia.org
* 17:24 bblack@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp4045.ulsfo.wmnet with OS bullseye
* 15:46 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Resync initialisesetting to try and pick up previoiusly deployed cirrus query routing changes (duration: 01m 05s)
* 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35237 and previous config saved to /var/cache/conftool/dbconfig/20220930-170620-ladsgroup.json
* 15:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 15:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 15:32 effie: restart mc-gp* for updates
* 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35236 and previous config saved to /var/cache/conftool/dbconfig/20220930-170546-ladsgroup.json
* 15:17 bd808: Toil reduction: !log messages now work from the SRE team's Freenode channel.
* 16:54 bblack@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 13:50 gehel: restart relforge for JVM upgrade - [[phab:T245120|T245120]]
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35235 and previous config saved to /var/cache/conftool/dbconfig/20220930-165040-ladsgroup.json
* 10:35 vgutierrez: revert ats 8.0.6-rc0 experiment on cp40[26,32]
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35234 and previous config saved to /var/cache/conftool/dbconfig/20220930-163533-ladsgroup.json
* 10:14 vgutierrez: rolling restart of ats-be to enable TLSv1.3 against origin servers - [[phab:T170567|T170567]]
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35233 and previous config saved to /var/cache/conftool/dbconfig/20220930-162027-ladsgroup.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10409 and previous config saved to /var/cache/conftool/dbconfig/20200214-093456-marostegui.json
* 15:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 09:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 09:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:51 moritzm: installing puppetdb-test2001 [[phab:T318931|T318931]]
* 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:25 volans: manually absented /usr/local/bin/apt2xml on the 5 hosts with puppet disabled
* 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35232 and previous config saved to /var/cache/conftool/dbconfig/20220930-131638-root.json
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35231 and previous config saved to /var/cache/conftool/dbconfig/20220930-130133-root.json
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35230 and previous config saved to /var/cache/conftool/dbconfig/20220930-124628-root.json
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35229 and previous config saved to /var/cache/conftool/dbconfig/20220930-123123-root.json
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35228 and previous config saved to /var/cache/conftool/dbconfig/20220930-121618-root.json
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35227 and previous config saved to /var/cache/conftool/dbconfig/20220930-120113-root.json
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetdb-test2001.codfw.wmnet
* 08:46 moritzm: installing 4.19.98 kernel update on Buster systems
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35226 and previous config saved to /var/cache/conftool/dbconfig/20220930-114605-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10408 and previous config saved to /var/cache/conftool/dbconfig/20200214-080600-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35225 and previous config saved to /var/cache/conftool/dbconfig/20220930-113101-root.json
* 06:51 vgutierrez: updating puppet compiler facts
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P35224 and previous config saved to /var/cache/conftool/dbconfig/20220930-112307-root.json
* 01:27 dpifke@deploy1001: Finished deploy [performance/navtiming@2eec00a]: (no justification provided) (duration: 00m 05s)
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetdb-test2001.codfw.wmnet on all recursors
* 01:27 dpifke@deploy1001: Started deploy [performance/navtiming@2eec00a]: (no justification provided)
* 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache puppetdb-test2001.codfw.wmnet on all recursors
* 00:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245202|T245202]] cirrus: Move all move_like traffic to codfw (duration: 01m 02s)
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:51 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: [[phab:T245202|T245202]] cirrus: Increase the pool counter limits a bit (duration: 01m 05s)
* 11:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb-test2001.codfw.wmnet
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35223 and previous config saved to /var/cache/conftool/dbconfig/20220930-104004-ladsgroup.json
* 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35222 and previous config saved to /var/cache/conftool/dbconfig/20220930-103943-ladsgroup.json
* 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35221 and previous config saved to /var/cache/conftool/dbconfig/20220930-102436-ladsgroup.json
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35220 and previous config saved to /var/cache/conftool/dbconfig/20220930-100930-ladsgroup.json
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35219 and previous config saved to /var/cache/conftool/dbconfig/20220930-095423-ladsgroup.json
* 09:42 moritzm: installing Linux 5.10.140 updates on Bullseye hosts (released via 11.5 point release), just rollout of the package, no reboots involved
* 07:37 XioNoX: add RPKI ROAs for 185.71.138.0/24 and 2001:67c:930::/48
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
* 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52320
* 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 52320
* 07:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:18 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
* 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35218 and previous config saved to /var/cache/conftool/dbconfig/20220930-070454-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35217 and previous config saved to /var/cache/conftool/dbconfig/20220930-065844-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35216 and previous config saved to /var/cache/conftool/dbconfig/20220930-064949-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35215 and previous config saved to /var/cache/conftool/dbconfig/20220930-064339-root.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35214 and previous config saved to /var/cache/conftool/dbconfig/20220930-063444-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35213 and previous config saved to /var/cache/conftool/dbconfig/20220930-062834-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35212 and previous config saved to /var/cache/conftool/dbconfig/20220930-061939-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35211 and previous config saved to /var/cache/conftool/dbconfig/20220930-061329-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35210 and previous config saved to /var/cache/conftool/dbconfig/20220930-060434-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35209 and previous config saved to /var/cache/conftool/dbconfig/20220930-055824-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35208 and previous config saved to /var/cache/conftool/dbconfig/20220930-054929-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35207 and previous config saved to /var/cache/conftool/dbconfig/20220930-054319-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35206 and previous config saved to /var/cache/conftool/dbconfig/20220930-053424-root.json
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35204 and previous config saved to /var/cache/conftool/dbconfig/20220930-052814-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35203 and previous config saved to /var/cache/conftool/dbconfig/20220930-051919-root.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35202 and previous config saved to /var/cache/conftool/dbconfig/20220930-051309-root.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P35201 and previous config saved to /var/cache/conftool/dbconfig/20220930-051206-root.json
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P35200 and previous config saved to /var/cache/conftool/dbconfig/20220930-050533-root.json
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35199 and previous config saved to /var/cache/conftool/dbconfig/20220930-041937-ladsgroup.json
* 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35198 and previous config saved to /var/cache/conftool/dbconfig/20220930-041916-ladsgroup.json
* 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35197 and previous config saved to /var/cache/conftool/dbconfig/20220930-040409-ladsgroup.json
* 03:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35196 and previous config saved to /var/cache/conftool/dbconfig/20220930-034903-ladsgroup.json
* 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35195 and previous config saved to /var/cache/conftool/dbconfig/20220930-033356-ladsgroup.json
* 00:31 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 00:22 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye


== 2020-02-13 ==
== 2022-09-29 ==
* 22:13 jeh: running filesystem tests on cloudvirt1024 [[phab:T241884|T241884]]
* 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
* 21:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
* 21:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
* 21:40 jbond42: refresh facts on compilers
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
* 21:38 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
* 21:37 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 21:35 ottomata: deploying production and canary releases for eventgate-logging-external (and destroying


== 2020-02-12 ==
== 2022-09-28 ==
* 23:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS buster
* 23:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2037']
* 23:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2037']
* 23:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35103 and previous config saved to /var/cache/conftool/dbconfig/20220928-231719-ladsgroup.json
* 23:11 XioNoX: deactivate BGP to office's router1 while it's on maintenance
* 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 21:59 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 23:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 21:58 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate
* 22:20 ejegg: updated fundraising CiviCRM from {{Gerrit|d31c19a0}} to {{Gerrit|f3461a44}}
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35102 and previous config saved to /var/cache/conftool/dbconfig/20220928-213701-ladsgroup.json
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35101 and previous config saved to /var/cache/conftool/dbconfig/20220928-213640-ladsgroup.json
* 21:21 ladsgroup@cumin1001


== 2020-02-11 ==
== 2022-09-27 ==
* 22:04 XioNoX: switchover RE mastership back re0 on cr1-eqsin - [[phab:T243080|T243080]]
* 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
* 21:50 XioNoX: reboot re0:cr1-eqsin (backup) - [[phab:T243080|T243080]]
* 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
* 21:45 cdanis: repool eqiad
* 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 21:37 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp107.*
* 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 21:36 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp108.*
* 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 21:36 bblack: re-pooling all cp10xx in eqiad
* 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 21:32 XioNoX: switchover RE mastership on cr1-eqsin - [[phab:T243080|T243080]]
* 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
* 21:14 robh: cp1067 powered back into service post firmware update via [[phab:T243167|T243167]]
* 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
* 21:11 cdanis: depool eqiad
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
* 21:01 marxarelli: completed group0 to 1.35.0-wmf.19 ([[phab:T233867|T233867]])
* 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 20:57 robh: cp108[45] returned to service, depooling cp108[67]for firmware update via [[phab:T243167|T243167]]
* 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 20:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.19
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
* 20:53 mutante: gerrit - moving gerrit db_pass from private module passwords to private hieradata
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:51 XioNoX: reboot backup RE on cr1-eqsin - [[phab:T243080|T243080]]
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
* 20:38 robh: depooling cp108[45] for firmware update via [[phab:T243167|T243167]]
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache (duration: 37m 31s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:19 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide: (duration: 00m 02s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:19 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 21:12 TheresNoTime: closing UTC late backport window
* 20:18 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide: (duration: 00m 03s)
* 21:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] (duration: 04m 53s)
* 20:18 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 XioNoX: depool eqsin for router upgrade - [[phab:T243080|T243080]]
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:01 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide: (duration: 00m 04s)
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:01 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 21:06 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 19:55 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache
* 21:06 samtar@deploy1002: Started scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]]
* 19:43 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.16 (duration: 01m 48s)
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] (duration: 06m 58s)
* 19:42 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.15 (duration: 01m 51s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:38 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.14 (duration: 02m 08s)
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
* 19:36 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.11 (duration: 10m 53s)
* 20:59 TheresNoTime: extending UTC late backport window
* 19:35 marxarelli: running `scap clean --delete` for old wmf branches wmf.11, wmf.14, wmf.15, wmf.16 ([[phab:T233867|T233867]])
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:03 volans: uploaded spicerack_0.0.30-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:00 Urbanecm: Create User:Ammarpad on ngwikimedia and promote to sysop, bureaucrat ([[phab:T240771|T240771]])
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:48 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18
* 20:58 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 18:43 twentyafterfour: getting ready to deploy wmf.18 refs  [[phab:T233866|T233866]]
* 20:58 samtar@deploy1002: Started scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]]
* 18:42 greg-g: restarting stashbot
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:35 bblack: ns1.wikimedia.org - changing static route destination on cr[12]-codfw from authdns2001 to dns2002 - [[phab:T242017|T242017]]
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:33 Urbanecm: Create ngwikimedia is done ([[phab:T240771|T240771]])
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 03s)
* 20:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] (duration: 05m 29s)
* 18:24 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Create ngwikimedia ([[phab:T240771|T240771]])
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@b471b64]: (no justification provided) (duration: 00m 05s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:20 dpifke@deploy1001: Started deploy [performance/navtiming@b471b64]: (no justification provided)
* 20:48 samtar@deploy1002: samtar and stang: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 18:19 urbanecm@deploy1001: Synchronized dblists/: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 20:48 samtar@deploy1002: Started scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]]
* 17:57 bblack: reboot dns2002 post-reimaging
* 20:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] (duration: 05m 14s)
* 17:13 vgutierrez: Disable KA on cp4031 - [[phab:T244464|T244464]]
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:49 vgutierrez: pool cp3055 running buster - [[phab:T242093|T242093]]
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:43 vgutierrez: repooling cp4031
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
* 16:38 vgutierrez: depooling cp4031 for some KA tests
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:25 vgutierrez: pool cp3056 running buster - [[phab:T242093|T242093]]
* 20:41 samtar@deploy1002: samtar and ryankemper: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 16:23 bblack: dns2002 - shutting down for hardware work and reinstall - [[phab:T242017|T242017]]
* 20:41 samtar@deploy1002: Started scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]]
* 16:21 bblack: dns2002 - stopping bird adverts to depool service for [[phab:T242017|T242017]]
* 20:38 samtar@deploy1002: Finished scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] (duration: 06m 02s)
* 16:20 bblack: dns2002 - downtimed in icinga for [[phab:T242017|T242017]]
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:38 vgutierrez: depool cp3056 and reimage as buster - [[phab:T242093|T242093]]
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:36 vgutierrez: pool cp3058 running buster - [[phab:T242093|T242093]]
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Configuring test.event stream in beta, no-op in prod - [[phab:T242122|T242122]] (duration: 01m 08s)
* 20:33 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 15:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 samtar@deploy1002: Started scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]]
* 15:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:24 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 14:58 vgutierrez: depool cp3055 and reimage as buster - [[phab:T242093|T242093]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:56 vgutierrez: pool cp3057 running buster - [[phab:T242093|T242093]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:52 moritzm: pruning old CAS logs (predating the current logger config for /var/log/cas/*) from idp1001/idp2001
* 20:22 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 14:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] (duration: 04m 58s)
* 14:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:21 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=labswiki --force "Ladsgroup" --custom-groups checkuser
* 20:15 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:20 vgutierrez: restart varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 14:07 vgutierrez: depool cp3057 and cp3058 and reimage as buster - [[phab:T242093|T242093]]
* 20:15 samtar@deploy1002: Started scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]]
* 13:52 vgutierrez: pool cp3059 and cp3060 running buster - [[phab:T242093|T242093]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10382 and previous config saved to /var/cache/conftool/dbconfig/20200211-130343-marostegui.json
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:34 Amir1: EU SWAT is done
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] (duration: 05m 46s)
* 12:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:28 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]], take II, cache (duration: 01m 06s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:26 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]] (duration: 01m 05s)
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]], Second round, cache issue (duration: 01m 07s)
* 20:04 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 12:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]] (duration: 01m 11s)
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]]
* 12:04 vgutierrez: depool cp3059 and cp360 and reimage as buster - [[phab:T242093|T242093]]
* 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 11:59 vgutierrez: repool cp3061 and cp3062 running buster - [[phab:T242093|T242093]]
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 11:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
* 11:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 11:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 11:20 vgutierrez: ats-tls effectively reusing connections between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:56 vgutierrez: depool cp3062 and reimage as buster - [[phab:T242093|T242093]]
* 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:54 vgutierrez: repool cp3064 running buster - [[phab:T242093|T242093]]
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:51 vgutierrez: depool cp3061 and reimage as buster - [[phab:T242093|T242093]]
* 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:50 vgutierrez: repool cp5006 and cp3063 running buster - [[phab:T242093|T242093]]
* 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 10:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:02 brennen: 1.40.0-wmf.3 ([[phab:T314192|T314192]]) no current blockers, promoting to group0
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
* 10:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
* 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 10:25 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 10:18 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 10:11 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 10:07 vgutierrez: rolling restart of ats-tls in ulsfo - [[phab:T244464|T244464]]
* 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
* 09:57 vgutierrez: depool cp3063 and cp3064 and reimage as buster - [[phab:T242093|T242093]]
* 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
* 09:52 vgutierrez: depool cp5006 and reimage as buster - [[phab:T242093|T242093]]
* 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 09:52 vgutierrez: pool cp5007 running buster - [[phab:T242093|T242093]]
* 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1107 weight from 10 to 11', diff saved to https://phabricator.wikimedia.org/P10380 and previous config saved to /var/cache/conftool/dbconfig/20200211-083812-marostegui.json
* 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 08:25 marostegui: Upgrade db1095:3312, db1095:3313
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10379 and previous config saved to /var/cache/conftool/dbconfig/20200211-082204-marostegui.json
* 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10378 and previous config saved to /var/cache/conftool/dbconfig/20200211-081421-marostegui.json
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 5 to 10 for db1107 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10377 and previous config saved to /var/cache/conftool/dbconfig/20200211-081319-marostegui.json
* 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10376 and previous config saved to /var/cache/conftool/dbconfig/20200211-080458-marostegui.json
* 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
* 07:57 akosiaris: [[phab:T242705|T242705]] systemctl stop uwsgi-ores on ores2001.
* 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 07:54 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 07:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10375 and previous config saved to /var/cache/conftool/dbconfig/20200211-075358-marostegui.json
* 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
* 07:47 marostegui: Upgrade es1013 - [[phab:T239791|T239791]]
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013 - [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10374 and previous config saved to /var/cache/conftool/dbconfig/20200211-074358-marostegui.json
* 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 07:23 vgutierrez: depool cp5007 and reimage as buster - [[phab:T242093|T242093]]
* 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 07:22 vgutierrez: pool cp5001 and cp5008 running buster - [[phab:T242093|T242093]]
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
* 07:21 marostegui: Remove partitions from db2086:3318 - [[phab:T239453|T239453]]
* 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10373 and previous config saved to /var/cache/conftool/dbconfig/20200211-071936-marostegui.json
* 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]], 710183 rows done
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10372 and previous config saved to /var/cache/conftool/dbconfig/20200211-071639-marostegui.json
* 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10371 and previous config saved to /var/cache/conftool/dbconfig/20200211-070720-marostegui.json
* 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
* 07:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
* 06:59 marostegui: Stop haproxy on dbproxy1001 - [[phab:T244463|T244463]]
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
* 06:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:48 marostegui: Remove grants in m1 for dbproxy1001 - [[phab:T231280|T231280]]
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:25 vgutierrez: depool cp5001 & cp5008 and reimage as buster - [[phab:T242093|T242093]]
* 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]]
* 06:18 marostegui: Failover m1-master from dbproxy1014 to dbproxy1012 - [[phab:T202367|T202367]]
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
* 00:26 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.18/skins/MinervaNeue: SWAT: Revert: Reduce userContributions icon code (duration: 01m 06s)
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Give NS_HELP same weight as NS_MAIN in search on wikitech (duration: 01m 06s)
* 14:06 taavi@deploy1002: Finished scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] (duration: 06m 59s)
* 00:15 ebernhardson@deploy1001: Synchronized wmf-config/: SWAT: Enable SpecialMute page on all wikis (duration: 01m 06s)
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 taavi@deploy1002: taavi and migr: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:59 taavi@deploy1002: Started scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]]
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
* 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 jbond: upload new wmf-laptop_0.5.4 package
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
* 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update [[phab:T311686|T311686]]
* 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
* 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
* 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
* 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
* 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
* 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - [[phab:T310745|T310745]]
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
* 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - [[phab:T310745|T310745]]
* 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
* 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
* 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
* 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
* 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
* 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
* 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
* 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
* 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 [[phab:T318128|T318128]]
* 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 36m 01s)
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
* 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2020-02-10 ==
== 2022-09-26 ==
* 23:30 robh: cp108[23] returned to service via [[phab:T243167|T243167]]
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 23:28 legoktm: restarting zuul
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 23:26 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 04s)
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 23:25 reedy@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 07s)
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 23:06 robh: cp108[01] returned to service, cp108[23] offline for bios update via [[phab:T243167|T243167]]
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 22:50 chasemp: phab1001:~# sudo /srv/phab/phabricator/bin/bulk make-silent  --id 2164
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 22:45 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add authevents as monolog channel (duration: 01m 06s)
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 22:43 robh: cp107[789] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 22:42 robh: cp107[89] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 21:58 robh: cp107[56] returned to service, cp107[78] offline for bios update via [[phab:T243167|T243167]]
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 21:43 arlolra: Updated Parsoid to {{Gerrit|612106d2}} ([[phab:T244412|T244412]], [[phab:T244413|T244413]], [[phab:T242746|T242746]], [[phab:T235273|T235273]], [[phab:T235307|T235307]], [[phab:T238845|T238845]], [[phab:T204618|T204618]], [[phab:T240054|T240054]])
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 21:38 robh: cp1075 & cp1076 offline for bios updates per [[phab:T243167|T243167]]
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 21:36 robh: cp1075 and cp1076 going offline for bios updates. This will cause a bit of cp irc icinga noise, but no paging. Not putting into maint mode, as there is no way to maint mode the noisest check (which checks all backends and thus shouldnt be disabled)
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 21:33 arlolra@deploy1001: Finished deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}} (duration: 10m 26s)
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 21:32 XioNoX: clamp tcp-mss on cr2-eqiad:xe-3/3/3
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:23 arlolra@deploy1001: Started deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}}
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:12 halfak@deploy1001: Finished deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]] (duration: 12m 18s)
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 halfak@deploy1001: Started deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]]
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:55 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 11s)
* 20:31 TheresNoTime: closing UTC late backport window
* 20:14 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 15s)
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 19:31 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:570393]] Config: Session Store: Switch group0 and group1 to kask-session [[phab:T243106|T243106]] (duration: 01m 06s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:28 mutante: Gerrit - added eevans to 'wmf-deployment' group ([[phab:T244508|T244508]])
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T242122|T242122]] Load new EventStreamConfig extension if so configured (duration: 01m 06s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:07 jforrester@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T242122|T242122]] Set default of wmgUseEventStreamConfig false everywhere (duration: 01m 06s)
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 18:39 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 01m 05s)
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 18:38 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 18:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.18 refs [[phab:T233867|T233867]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 18:21 twentyafterfour: MediaWiki train: finally moving forward with group0 wikis to 1.35.0-wmf.18 refs [[phab:T233866|T233866]]
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244561|T244561]] Set Kartographer servers to Wikimedia servers (duration: 01m 06s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:48 moritzm: installing libexif security updates on jessie
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:22 vgutierrez: pooling cp5002 and cp5009 running buster - [[phab:T242093|T242093]]
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:45 XioNoX: push outbound flowspec support to core routers
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after first day of 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10366 and previous config saved to /var/cache/conftool/dbconfig/20200210-154552-marostegui.json
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 15:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 15:33 godog: roll restart cassandra on session* to apply logging changes - [[phab:T242585|T242585]]
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 15:23 moritzm: uploading debdeploy 0.0.99.13 to apt.wikimedia.org
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 15:22 godog: roll restart cassandra on restbase* to apply logging changes - [[phab:T242585|T242585]]
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 15:06 marostegui: Reload haproxy on dbproxy1017 and dbproxy1017 - [[phab:T244209|T244209]]
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 15:04 twentyafterfour@deploy1001: Finished scap: full scap sync prior to wmf.18 rollout (duration: 20m 13s)
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 15:04 godog: roll restart cassandra on maps* to apply logging changes - [[phab:T242585|T242585]]
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 15:03 vgutierrez: rolling restart of ats-tls - [[phab:T240950|T240950]]
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 15:00 marostegui: Restart mysql on m5 master (wikitech will go down) - [[phab:T244209|T244209]]
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 14:52 vgutierrez: rolling restart of ats-tls in ulsfo - [[phab:T244464|T244464]]
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 14:46 vgutierrez: depool cp5002 and cp5009 and reimage as buster - [[phab:T242093|T242093]]
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 14:44 twentyafterfour@deploy1001: Started scap: full scap sync prior to wmf.18 rollout
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 14:42 vgutierrez: repool cp5003 and cp5010 running buster - [[phab:T242093|T242093]]
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 14:41 marostegui: Full-upgrade db1133 (without restarting mysql) - [[phab:T244209|T244209]]
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 14:40 twentyafterfour: MediaWiki Train: Running a full scap to prepare for moving forward to 1.35.0-wmf.18 ( [[phab:T233866|T233866]] )
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 14:32 marostegui: Downtime m5 hosts for the upcoming maintenance - [[phab:T244209|T244209]]
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 14:17 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 14:11 XioNoX: remove TCP-MSS clamping on cr3-knams
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 13:48 vgutierrez: depool cp5003 and reimage as buster - [[phab:T242093|T242093]]
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 13:47 vgutierrez: pooling cp5004 with buster - [[phab:T242093|T242093]]
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 13:46 vgutierrez: depool cp5010 and reimage as buster - [[phab:T242093|T242093]]
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 13:45 vgutierrez: pooling cp5011 with buster - [[phab:T242093|T242093]]
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 13:28 godog: roll restart cassandra on aqs to apply logging changes - [[phab:T242585|T242585]]
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 13:03 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Wikibase: [[gerrit:570911{{!}}Revert "wbterms: Set default for the term store to read new"]] ([[phab:T244529|T244529]]) (duration: 01m 00s)
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 13:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 12:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 12:58 Urbanecm: EU SWAT is done
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 12:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|989c9f8}}: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 58s)
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 12:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|989c9f8}}: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 59s)
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 12:49 urbanecm@deploy1001: Finished scap: SWAT: {{Gerrit|799224f}}:  {{Gerrit|137a40e}} ([[phab:T241242|T241242]]; [[phab:T243974|T243974]]) (duration: 20m 18s)
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 12:30 vgutierrez: depool cp5004 and reimage as buster - [[phab:T242093|T242093]]
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 12:29 vgutierrez: pooling cp5005 with buster - [[phab:T242093|T242093]]
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 12:28 urbanecm@deploy1001: Started scap: SWAT: {{Gerrit|799224f}}:  {{Gerrit|137a40e}} ([[phab:T241242|T241242]]; [[phab:T243974|T243974]])
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 12:23 vgutierrez: pooling ncredir1001 with buster - [[phab:T243391|T243391]]
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 12:18 _joe_: running puppet, scap pull on mwdebug1001
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 12:17 vgutierrez: upload trafficserver 8.0.5-1wm15 to apt.wm.o (buster) - [[phab:T244538|T244538]]
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 12:08 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 12:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 12:06 vgutierrez: testing ats 8.0.5-1-wm15 on cp4032 - [[phab:T244538|T244538]]
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|014405a}}: Add throttle rules for OSU Editathon and workshop for cawiki, remove expired ones ([[phab:T244608|T244608]], [[phab:T244645|T244645]]) (duration: 01m 03s)
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:57 vgutierrez: depool ncredir1001 and reimage as buster - [[phab:T243391|T243391]]
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 11:57 vgutierrez: pooling ncredir1002 with buster - [[phab:T243391|T243391]]
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 11:43 vgutierrez: pooling cp4027 with buster - [[phab:T242093|T242093]]
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 11:38 vgutierrez: depool ncredir1002 and reimage as buster - [[phab:T243391|T243391]]
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 11:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 11:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 11:22 vgutierrez: depooling cp5011 and cp5005 & reimage as buster - [[phab:T242093|T242093]]
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 11:07 vgutierrez: depool cp4027 & reimage as buster - [[phab:T242093|T242093]]
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:07 vgutierrez: pooling ncredir2001 with buster - [[phab:T243391|T243391]]
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 11:03 vgutierrez: pooling cp4028 with buster - [[phab:T242093|T242093]]
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 10:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 10:47 godog: remove old logs from /var/log/swift on swift hsots
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 10:31 vgutierrez: depool ncredir2001 and reimage as buster - [[phab:T243391|T243391]]
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 10:26 vgutierrez: depool cp4028 & reimage as buster - [[phab:T242093|T242093]]
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 10:14 moritzm: installing sudo security updates for buster
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 08:53 vgutierrez: pooling cp4029 with buster - [[phab:T242093|T242093]]
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 1 to 5 for db1107 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10364 and previous config saved to /var/cache/conftool/dbconfig/20200210-084446-marostegui.json
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 08:43 vgutierrez: pooling ncredir2002 with buster - [[phab:T243391|T243391]]
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 08:34 effie: rolling restart php-fpm on labweb[1001-1002].wikimedia.org,mw*.eqiad.wmnet,scandium.eqiad.wmnet, wtp[1025-1048].eqiad.wmnet
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 08:32 effie: update php-apcu on eqiad - [[phab:T236800|T236800]]
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 08:29 effie: rolling restart php-fpm on cloudweb2001-dev.wikimedia.org,mw[2135-2147,2150-2212,2214-2290].codfw.wmnet,wtp[2001-2020].codfw.wmnet
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:23 effie: update php-apcu on codfw - [[phab:T236800|T236800]]
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 07:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 07:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 07:54 moritzm: updating d-i netinst image for Stretch 9.12 point release (which bumped the kernel ABI)
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:29 moritzm: updating d-i netinst image for Buster 10.3 point release (which bumped the kernel ABI)
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 07:09 elukey: restore mw1347's mcrouter settings to its default (proxy threads 10 -> 5)
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Place db1107 - MariaDB 10.4 on s1 with minimal weight - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10363 and previous config saved to /var/cache/conftool/dbconfig/20200210-070140-marostegui.json
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 06:55 vgutierrez: depool ncredir2002 and reimage as buster - [[phab:T243391|T243391]]
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1019', diff saved to https://phabricator.wikimedia.org/P10362 and previous config saved to /var/cache/conftool/dbconfig/20200210-065326-marostegui.json
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10361 and previous config saved to /var/cache/conftool/dbconfig/20200210-065135-marostegui.json
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 06:47 vgutierrez: depool cp4029 & reimage as buster - [[phab:T242093|T242093]]
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019', diff saved to https://phabricator.wikimedia.org/P10360 and previous config saved to /var/cache/conftool/dbconfig/20200210-064553-marostegui.json
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10359 and previous config saved to /var/cache/conftool/dbconfig/20200210-064458-marostegui.json
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 06:39 marostegui: Compress db1124:3318 - this will generate lag on s8 wiki replicas - [[phab:T232446|T232446]]
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10358 and previous config saved to /var/cache/conftool/dbconfig/20200210-063716-marostegui.json
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 06:23 marostegui: Remove partitions from db1099:3311, db1099:3318 [[phab:T239453|T239453]]
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool  db1099:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10357 and previous config saved to /var/cache/conftool/dbconfig/20200210-062112-marostegui.json
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10356 and previous config saved to /var/cache/conftool/dbconfig/20200210-061822-marostegui.json
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10355 and previous config saved to /var/cache/conftool/dbconfig/20200210-061656-marostegui.json
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 13:56 moritzm: installing mako security updates
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:51 moritzm: installing bind9 security updates on Bullseye
* 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:51 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] (duration: 06m 05s)
* 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 12:44 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]]
* 12:25 moritzm: installing unzip security updates
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
* 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|033ab75917932a6b6e1cda8cc26f5f069448e3b9}}: arwiki: Properly grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 46s)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:56 btullis: adding 80GB of virtual disk to matomo1002
* 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a5486780a0543d7fb1c637d2abe48855e753d13}}: arwiki: Grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:07 godog: upgrade grafana to 8.5.13
* 08:04 godog: add 20G to prometheus/analytics in codfw
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:31 oblivian@deploy1002: Finished scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] (duration: 05m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:26 oblivian@deploy1002: Started scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]]
* 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|620bb80e3534c812d7f4de25547d92104b8609a0}}: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and  shi to InterwikiSortOrders (duration: 03m 40s)
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]) (duration: 03m 46s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]; 1/2) (duration: 03m 40s)
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
* 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2d2c08fc6e0dd5c0c85fbe31f85201721871aa9}}: eswiki: Enable structured mentor list ([[phab:T310905|T310905]]) (duration: 04m 30s)
* 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-02-09 ==
== 2022-09-25 ==
* 05:11 cdanis: [[phab:T238305|T238305]] hardreset cp3051
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye


== 2020-02-08 ==
== 2022-09-23 ==
* 19:12 _joe_: set cpufreq governor to performance on mw1328
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 17:04 _joe_: restarted php7.2-fpm on mw1332
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 16:53 Urbanecm: mwscript resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 12.24.27.50
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 16:47 gjg@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Editathon in Charolette (duration: 00m 58s)
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 00:05 Jeff_Green: switched payments.wikimedia.org to codfw datacenter due to [[phab:T244610|T244610]]
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance


== 2020-02-07 ==
== 2022-09-22 ==
* 22:20 jeh: ceph: round 2 OSD failover and recovery testing on cloudcephosd1003.wikimedia.org [[phab:T240718|T240718]]
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 20:47 mutante: OS install on new install_server VMs worked on second attempt, issues are gone. signed puppet certs for install1003.eqiad.wmnet, install2003.codfw.wmnet, initial puppet runs ([[phab:T224576|T224576]])
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 20:42 jeh: ceph: OSD failover and recovery testing on cloudcephosd1003.wikimedia.org [[phab:T240718|T240718]]
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 mutante: ganeti: attempting to reinstall install1003 which failed last time
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10350 and previous config saved to /var/cache/conftool/dbconfig/20200207-173850-marostegui.json
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:36 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync InitializeSettings again for lols refs [[phab:T233866|T233866]] (duration: 01m 03s)
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:32 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570929 refs [[phab:T233866|T233866]] (duration: 01m 02s)
* 21:23 dancy@deploy1002: backport aborted: (duration: 00m 05s)
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10349 and previous config saved to /var/cache/conftool/dbconfig/20200207-172541-marostegui.json
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:22 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back all wikis to 1.35.0-wmf.16 refs [[phab:T233866|T233866]]
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:19 marostegui: Start MySQL on es1019 after onsite maintenance [[phab:T243963|T243963]]
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:55 brennen: end of utc late backport & config window
* 16:38 filippo@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:13 XioNoX: remove MSS clamping from eqiad/eqord/knams/esams
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 16:05 andrew@deploy1001: Finished deploy [horizon/deploy@bc777d6]: Fix for [[phab:T243422|T243422]] (duration: 03m 45s)
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 16:04 vgutierrez: pooling cp4030 with buster - [[phab:T242093|T242093]]
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 16:03 bblack: removing GRE MTU mitigations from cp[135]xxx - [[phab:T232602|T232602]]
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 16:01 andrew@deploy1001: Started deploy [horizon/deploy@bc777d6]: Fix for [[phab:T243422|T243422]]
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 15:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:36 brennen@deploy1002: backport aborted:  (duration: 02m 16s)
* 15:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:25 vgutierrez: depool & reimage cp4030 as buster - [[phab:T242093|T242093]]
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:21 vgutierrez: pooling cp4031 with buster - [[phab:T242093|T242093]]
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:20 vgutierrez: pooling ncredir3001 running buster - [[phab:T243391|T243391]]
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:18 marostegui: Restart all instances on db1124 and db1125 to pick up a new replication filter - [[phab:T240094|T240094]]
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:11 marostegui: Restart all instances on db2094 and db2095 to pick up a new replication filter - [[phab:T240094|T240094]]
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:43 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: REVERT: Wikibase Client: Fix setting name typo ([[phab:T244529|T244529]]) (duration: 01m 40s)
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 14:43 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=zhwiki --force "Amir Sarabadani (WMDE)" --sysop ([[phab:T244578|T244578]])
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:40 hoo@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 14:38 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase Client: Fix setting name typo ([[phab:T244529|T244529]]) (duration: 01m 20s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:33 vgutierrez: depool and reimage ncredir3001 as buster - [[phab:T243391|T243391]]
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:32 vgutierrez: depool & reimage cp4031 as buster - [[phab:T242093|T242093]]
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:23 vgutierrez: pooling ncredir3002 running buster - [[phab:T243391|T243391]]
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:26 vgutierrez: pooling cp4021 with buster - [[phab:T242093|T242093]]
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 13:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:38 dancy@deploy1002: Started scap: testing
* 12:51 vgutierrez: depool and reimage ncredir3002 as buster - [[phab:T243391|T243391]]
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 12:42 vgutierrez: depool & reimage cp4021 as buster - [[phab:T242093|T242093]]
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 12:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 12:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 11:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 11:57 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 11:25 vgutierrez: pooling ncredir5001 running buster - [[phab:T243391|T243391]]
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 11:24 vgutierrez: pooling cp4022 with buster - [[phab:T242093|T242093]]
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 11:09 akosiaris: undo wikifeeds experiments
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:42 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:36 akosiaris: conduct experiments with stopping/starting uwsgi-ores on ores2001 [[phab:T242705|T242705]]
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:24 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:23 vgutierrez: depool and reimage ncredir5001 as buster - [[phab:T243391|T243391]]
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:14 vgutierrez: depool & reimage cp4022 as buster - [[phab:T242093|T242093]]
* 16:39 dancy@deploy1002: Sync cancelled.
* 10:02 akosiaris: increase capacity for wikifeeds by 50% [[phab:T244535|T244535]]
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 10:02 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 10:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:53 ema: A:mw: increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ [[phab:T241145|T241145]]
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:09 godog: roll restart cassandra instance on restbase-dev
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:03 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:03 godog: restart cassandra on restbase-dev1004 to test logging pipeline onboard
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:59 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P10343 and previous config saved to /var/cache/conftool/dbconfig/20200207-085846-marostegui.json
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:54 marostegui: Upgrade db1090:3312, db1090:3317
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10342 and previous config saved to /var/cache/conftool/dbconfig/20200207-085432-marostegui.json
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10341 and previous config saved to /var/cache/conftool/dbconfig/20200207-084447-marostegui.json
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:44 moritzm: installing libexif security updates
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:21 akosiaris: deploy https://gerrit.wikimedia.org/r/570726 [[phab:T244535|T244535]] to avoid CPU throttling of wikifeeds
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 08:21 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Increase base weight for db1126', diff saved to https://phabricator.wikimedia.org/P10340 and previous config saved to /var/cache/conftool/dbconfig/20200207-075323-marostegui.json
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10339 and previous config saved to /var/cache/conftool/dbconfig/20200207-075234-marostegui.json
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:48 marostegui: Remove revision partitions from db2085:3318 [[phab:T239453|T239453]]
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Fullyy repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10338 and previous config saved to /var/cache/conftool/dbconfig/20200207-074511-marostegui.json
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10337 and previous config saved to /var/cache/conftool/dbconfig/20200207-074407-marostegui.json
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10336 and previous config saved to /var/cache/conftool/dbconfig/20200207-074258-marostegui.json
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10335 and previous config saved to /var/cache/conftool/dbconfig/20200207-073130-marostegui.json
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10334 and previous config saved to /var/cache/conftool/dbconfig/20200207-073026-marostegui.json
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10333 and previous config saved to /var/cache/conftool/dbconfig/20200207-063831-marostegui.json
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10332 and previous config saved to /var/cache/conftool/dbconfig/20200207-063402-marostegui.json
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:31 elukey: force a puppet run on all ores[12] nodes
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10331 and previous config saved to /var/cache/conftool/dbconfig/20200207-062731-marostegui.json
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 06:26 marostegui: Reboot db1107 for update - [[phab:T242702|T242702]]
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10330 and previous config saved to /var/cache/conftool/dbconfig/20200207-062502-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10329 and previous config saved to /var/cache/conftool/dbconfig/20200207-062345-marostegui.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10328 and previous config saved to /var/cache/conftool/dbconfig/20200207-062043-marostegui.json
* 04:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 04:16 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 04:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:25 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:24 robh: eqsin pdu work ongoing starting now.  ps1-603 swapping per [[phab:T242250|T242250]]
* 00:13 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:11 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-02-06 ==
== 2022-09-21 ==
* 23:44 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:37 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244133|T244133]] [cswikisource] Enable VisualEditor in the Edice namespace (duration: 01m 07s)
* 20:46 tgr_: UTC late deploys done
* 23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T159711|T159711]] [[phab:T161365|T161365]] [[phab:T164435|T164435]] [nlwiki] Enable VisualEditor in the Project namespace (duration: 01m 08s)
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 23:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:15 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244405|T244405]] Don't trying to assign  to  if it's unset (duration: 01m 07s)
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 22:50 jforrester@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/VisualEditor: [[phab:T242184|T242184]] Change tags method so anon edits will go through (duration: 01m 08s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:42 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 22:18 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:13 mutante: turning mw2271 and mw2163 into canary appservers for codfw, this adds mediawiki-testers shell users and removes scap sql scripts, rest stays as is ([[phab:T242606|T242606]])
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:54 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 21:40 twentyafterfour: train blocked due to serious incident related to deploying the latest branch. Incident documentation: https://wikitech.wikimedia.org/wiki/Incident_documentation/20200206-mediawiki refs [[phab:T233866|T233866]]
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:30 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 21:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:05 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 20:52 akosiaris: restart all wikifeeds pods
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:45 akosiaris: restart restbase on restbase1027
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 20:30 twentyafterfour: sync-wikiversions --force
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:30 twentyafterfour@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 20:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 19:45 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244405|T244405]] Set wgLogoHD before adding wordmark (duration: 01m 06s)
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 19:36 bblack: re-pool cp1075 (eqiad text)
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:33 addshore: SWAT done!
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:32 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/WikibaseLexemeCirrusSearch: [[phab:T244479|T244479]] Update namespace for PrefetchingTermLookup & fix tests (duration: 01m 06s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:31 bblack: depool cp1075 (eqiad text) for minor experimentation
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:29 addshore@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Babel/includes/Babel.php: [[phab:T243713|T243713]] Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 19:28 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel/includes/Babel.php: [[phab:T243713|T243713]] Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 2.IS (duration: 01m 06s)
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:23 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 1.CS (duration: 01m 07s)
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:23 cdanis: manual puppet run on netflow1001 looked good; ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "run-puppet-agent --enable 'rollout of {{Gerrit|I60692f0e8}} [[phab:T237587|T237587]] cdanis'"
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:22 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 19:20 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (1/2) (duration: 01m 06s)
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 19:20 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 19:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere [[phab:T243395|T243395]], sync again for luck (duration: 01m 06s)
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 19:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "disable-puppet 'rollout of {{Gerrit|I60692f0e8}} [[phab:T237587|T237587]] cdanis'"
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 19:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere [[phab:T243395|T243395]] (duration: 01m 07s)
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 19:05 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]] (duration: 01m 10s)
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:01 moritzm: restarting exim on mendelevium to pick up cyrus-sasl security updates
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:58 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:55 moritzm: restarting apache on tungsten/dbmonitor to pick up cyrus-sasl security updates
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 18:53 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8e15868]: Update mobileapps to {{Gerrit|ceeb950}} (duration: 06m 27s)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:46 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8e15868]: Update mobileapps to {{Gerrit|ceeb950}}
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 18:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:06 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 17:32 herron: set performance cpu scaling governor on maps*
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:49 vgutierrez: pooling ncredir5002 running buster - [[phab:T243391|T243391]]
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:38 vgutierrez: pooling cp4023 with buster - [[phab:T242093|T242093]]
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:36 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic (duration: 00m 19s)
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:35 XioNoX: remove AS prepending in esams/knams
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 16:31 bblack: lvs1013 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:30 bblack: lvs1014 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:30 bblack: lvs1015 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:29 bblack: lvs1016 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:28 moritzm: restarting apache on bromine to pick up SASL security updates
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 16:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:22 moritzm: installing cyrus-sasl2 security updates on jessie
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:20 bblack: lvs2001 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:19 bblack: lvs2002 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:19 bblack: lvs2003 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:07 vgutierrez: depool and reimage ncredir5002 as buster - [[phab:T243391|T243391]]
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 16:07 bblack: lvs4005 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:06 bblack: lvs4006 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:06 bblack: lvs4007 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:03 vgutierrez: depool & reimage cp4023 as buster - [[phab:T242093|T242093]]
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:03 vgutierrez: pooling cp4024 with buster - [[phab:T242093|T242093]]
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:59 akosiaris: repool eventgate-analytics/eqiad. Experiment proved the failover wouldn't cause (on it's own) a problem. Experiment done.
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 15:58 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:57 halfak@deploy1001: Finished deploy [ores/deploy@50a101a]: [[phab:T242705|T242705]] (duration: 04m 35s)
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 15:56 vgutierrez: pooling ncredir4001 running buster - [[phab:T243391|T243391]]
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:55 moritzm: installing qemu security updates
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:54 bblack: lvs5001 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:53 bblack: lvs5002 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul
* 15:53 halfak@deploy1001: Started deploy [ores/deploy@50a101a]: [[phab:T242705|T242705]]
* 15:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:52 bblack: lvs5003 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 15:50 moritzm: installing python-ecdsa security updates
* 15:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:41 moritzm: installing jsoup security updates
* 15:30 vgutierrez: depool & reimage ncredir4001 as buster - [[phab:T243391|T243391]]
* 15:29 vgutierrez: depool & reimage cp4024 as buster - [[phab:T242093|T242093]]
* 15:28 vgutierrez: pooling ncredir4002 running buster - [[phab:T243391|T243391]]
* 15:27 moritzm: installing sudo security updates on jessie
* 15:23 vgutierrez: pooling cp4025 with buster - [[phab:T242093|T242093]]
* 15:14 ema: A:mw-api: force puppet run to increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ [[phab:T241145|T241145]]
* 15:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 godog: extend graphite1004 / graphite2003 fs +200G
* 14:56 vgutierrez: depool and reimage ncredir4002 as buster - [[phab:T243391|T243391]]
* 14:46 vgutierrez: depool & reimage cp4025 as buster - [[phab:T242093|T242093]]
* 14:16 akosiaris: 20mins in with eventgate-analytics/eqiad depooled from discovery, no issues yet.
* 14:14 ema: run puppet on mw-api-canary to revert nginx keepalive_requests bump [[phab:T241145|T241145]]
* 13:55 marostegui: Stop MySQL on es1019, upgrade and poweroff for on-site maintenance - [[phab:T243963|T243963]]
* 13:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 13:53 akosiaris: depool eqiad eventgate-analytics for testing purposes. Requests will flow to codfw, monitoring https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=now-30m&to=now for issues.
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for onsite maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10321 and previous config saved to /var/cache/conftool/dbconfig/20200206-135157-marostegui.json
* 13:45 XioNoX: rollback deactivate BGP transits on cr3-knams
* 13:34 elukey: repool mw1347 with mcrouter running with 10 proxy threads (was: 5)
* 13:31 XioNoX: reboot cr3-knams
* 13:31 elukey: depool mw1347 to test some mcrouter settings
* 13:27 XioNoX: deactivate BGP transits on cr3-knams
* 13:22 vgutierrez: Enable server session sharing on ats-tls in cp4031 - [[phab:T244464|T244464]]
* 13:10 XioNoX: rollback: deactivate BGP transits on cr2-eqsin
* 13:00 XioNoX: reboot cr2-eqsin for sw upgrade
* 13:00 addshore: SWAT done
* 13:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: resync REVERT Enable EntitySourceBasedFederation for group1 (duration: 01m 07s)
* 12:59 XioNoX: deactivate BGP transits on cr2-eqsin
* 12:58 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: REVERT Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]], due to [[phab:T244479|T244479]] (duration: 01m 07s)
* 12:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]] (duration: 01m 06s)
* 12:46 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel: REVERT Fetch central babel information over SQL query, not API ([[phab:T243726|T243726]]) (duration: 01m 07s)
* 12:44 addshore@deploy1001: sync-file aborted: Fetch central babel information over SQL query, not API ([[phab:T243726|T243726]]) (duration: 01m 04s)
* 12:40 vgutierrez: pooling cp3065 - [[phab:T242093|T242093]]
* 12:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group0 [[phab:T243395|T243395]] (duration: 01m 07s)
* 12:34 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enable delayed new upload jobs for MachineVision extension (duration: 01m 08s)
* 12:26 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove handler deleted from the MachineVision extension (duration: 01m 05s)
* 12:25 XioNoX: remove full-duplex statement from eqsin Tata link (not supported on Junos 18, as 10G is full duplex anyway)
* 12:24 cparle@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: Use the wbsetclaim API to add depicts statements (duration: 01m 09s)
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5e1cbb2}}: Enable CX in te, kn, gu, mr and pawiki as a default tool ([[phab:T243271|T243271]], [[phab:T243272|T243272]], [[phab:T243273|T243273]], [[phab:T243274|T243274]], [[phab:T243275|T243275]]) (duration: 01m 09s)
* 11:41 akosiaris: upgrade etherpad-lite on etherpad1002 to 1.8.0-1
* 11:38 kart_: Updated cxserver to 2020-02-05-051751-production ([[phab:T244230|T244230]], [[phab:T234323|T234323]])
* 11:35 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:33 akosiaris: upload etherpad-lite_1.8.0-1 to apt.wikimedia.org buster-wikimedia/main
* 11:31 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:28 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:21 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348". no effect observed
* 10:20 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348"
* 10:19 vgutierrez: Enabling HTTP keepalive between ats-tls and varnish-frontend on cp4031 - [[phab:T244464|T244464]]
* 10:00 vgutierrez: depool and reimage cp3065 as buster - [[phab:T242093|T242093]]
* 09:59 vgutierrez: upload trafficserver 8.0.5-1wm14 to apt.wm.o (buster) - [[phab:T242093|T242093]]
* 09:08 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} (duration: 11m 41s)
* 08:56 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}}
* 08:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} to wdqs1010.eqiad.wmnet (duration: 00m 29s)
* 08:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} to wdqs1010.eqiad.wmnet
* 08:23 marostegui: Reboot dbproxy1012 and dbproxy1014 for upgrade
* 08:18 dcausse: restarting blazegraph on wdqs1006: [[phab:T242453|T242453]]
* 08:17 akosiaris: switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348 to
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10319 and previous config saved to /var/cache/conftool/dbconfig/20200206-065906-marostegui.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10318 and previous config saved to /var/cache/conftool/dbconfig/20200206-065238-marostegui.json
* 06:46 elukey: run puppet on all ores[12]* nodes
* 02:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 02:42 mutante: ganeti - Creating new VM named install2003.codfw.wmnet in codfw with row=A vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private ([[phab:T244390|T244390]])
* 02:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 02:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 02:21 mutante: ganeti - Creating new VM named install1003.eqiad.wmnet in eqiad with row=C vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private ([[phab:T244390|T244390]])
* 02:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm


== 2020-02-05 ==
== 2022-09-20 ==
* 23:30 ebernhardson: delete search indices duplicated on multiple clusters for: hywwiki, chrwiktionary, gcrwiki, mnwwiki, noboard_chapterswikimedia nqowiki nrmwiki outreachwiki and srnwiki
* 20:19 cjming: end of UTC late backport window
* 23:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@a51f927]: Update mobileapps to {{Gerrit|a7928fa}} (duration: 10m 48s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@a51f927]: Update mobileapps to {{Gerrit|a7928fa}}
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 22:07 mutante: Gerrit - added ppchelko to 'wmf-deployment' Gerrit group (he is already in deployment admin group) ([[phab:T244389|T244389]])
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:37 arlolra@deploy1001: Finished deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to {{Gerrit|74730a3}} (duration: 03m 07s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:33 arlolra@deploy1001: Started deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to {{Gerrit|74730a3}}
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:31 mutante: killing and restarting wikibugs, it was reporting each update twice
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy (duration: 00m 07s)
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 20:51 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy (duration: 13m 28s)
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 20:50 mutante: ores1004 - systemctl start celery-ores-worker
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 20:45 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 01m 07s)
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 20:44 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 20:37 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 20:34 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1269.eqiad.wmnet
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 20:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1267.eqiad.wmnet
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:25 mutante: mw1267 restarting php7.2-fpm
* 18:50 jynus: restart db2100:s7 to apply new config
* 20:21 joal@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version (duration: 00m 08s)
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 20:21 joal@deploy1001: Started deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:09 twentyafterfour: Preparing to deploy wmf/1.35.0-wmf.18 to group1 wikis refs [[phab:T233866|T233866]]
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:09 moritzm: installing git security updates for jessie
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 20:00 moritzm: installing unzip security updates
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 19:44 mutante: LDAP - added spramduya to wmf group ([[phab:T243802|T243802]])
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 19:38 jforrester@