You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ejegg: re-enabled dedupe_civicrm_contacts and major_gifts_addresses fundraising jobs)
imported>Stashbot
(sukhe: disable puppet on dns4003 till we resolve the puppet failures)
Line 1: Line 1:
== 2019-07-26 ==
== 2022-10-05 ==
* 00:53 ejegg: re-enabled dedupe_civicrm_contacts and major_gifts_addresses fundraising jobs
* 00:05 sukhe: disable puppet on dns4003 till we resolve the puppet failures
* 00:51 ejegg: re-enabled donations queue consumer
* 00:15 ejegg: disabled donations queue consumer


== 2019-07-25 ==
== 2022-10-04 ==
* 23:47 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/extension.json: Fix over-eager GrowthExperiments popups ([[phab:T229045|T229045]]) (duration: 00m 50s)
* 23:09 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[:gerrit:523214{{!}}Revert "Delete Image-reviewer group from commonswiki for good"]] ([[phab:T228098|T228098]]) (duration: 00m 47s)
* 22:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:520364{{!}}Add sju, sjd, and rmf to wmgExtraLanguageNames]] ([[phab:T226701|T226701]]) (duration: 00m 47s)
* 21:28 cjming: end of UTC late backport window
* 23:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:525580{{!}}Enable VisualEditor in namespace Wikipédia on Slovak Wikipedia]] ([[phab:T229014|T229014]]) (duration: 00m 48s)
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:34 ejegg: re-enabled donations queue consumer
* 21:25 cjming@deploy1002: Finished scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] (duration: 05m 06s)
* 22:07 bblack: lvs1013 - restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:04 bblack: lvs1014 - restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:02 bblack: lvs1015 - restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:02 ejegg: turned off dedupe_civicrm_contacts fundraising job
* 21:21 cjming@deploy1002: cjming and cjming: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 21:59 bblack: lvs1016 - restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:20 cjming@deploy1002: Started scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]]
* 21:47 bblack: primary high-traffic2 lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:46 XioNoX: apply export BGP_Wikimedia_no_dfz to eqiad's Confed_esams - [[phab:T227808|T227808]]
* 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:40 ejegg: turned off major_gifts_addresses fundraising job
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:38 bblack: primary high-traffic1 lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:07 bblack: backup lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - [[phab:T228190|T228190]]
* 21:07 cjming@deploy1002: Finished scap: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]] (duration: 05m 40s)
* 20:54 hashar: Rebasing mediawiki/extensions/MobileFrontend@wmf/1.34.0-wmf.15 for a build/CI related change to package.json https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/525632/
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:37 XioNoX: add prometheus-bird-exporter to stretch-wikimedia repo
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:15 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:01 cjming@deploy1002: cjming and mdsshakil: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:02 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 21:01 cjming@deploy1002: Started scap: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]]
* 19:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], feeds timing out. (duration: 05m 34s)
* 20:59 cjming@deploy1002: Finished scap: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]] (duration: 06m 35s)
* 19:53 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], feeds timing out.
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:53 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], take 3 (duration: 03m 14s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:49 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], take 3
* 20:53 cjming@deploy1002: cjming and trainbranchbot: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 19:49 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], take 2 (duration: 06m 33s)
* 20:52 cjming@deploy1002: Started scap: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]]
* 19:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:44 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 20:49 cjming@deploy1002: Sync cancelled.
* 19:42 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]], take 2
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:42 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]] (duration: 13m 42s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:29 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html [[phab:T229016|T229016]]
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:42 cjming@deploy1002: cjming and aishik: Backport for [[gerrit:838207{{!}}Add wordmark and tagline for Bengali Wikibooks (T319320)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 19:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 20:41 cjming@deploy1002: Started scap: Backport for [[gerrit:838207{{!}}Add wordmark and tagline for Bengali Wikibooks (T319320)]]
* 19:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:39 cjming@deploy1002: Finished scap: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]] (duration: 14m 29s)
* 19:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:01 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:36 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:25 cjming@deploy1002: cjming and d3r1ck01: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 18:19 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 20:25 cjming@deploy1002: Started scap: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]]
* 18:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:54 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS buster
* 18:19 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:34 mutante: gerrit - deploying puppet refactoring change
* 18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:34 tzatziki: removing 1 file for legal compliance
* 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS buster
* 18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:24 tzatziki: removing 1 file for legal compliance
* 18:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:00 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:59 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:21 moritzm: installing gdk-pixbuf security updates
* 17:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 17:59 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:58 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@11d9d4a]: Update service-mobileapp-node to {{Gerrit|200a323}} ([[phab:T228938|T228938]] [[phab:T228287|T228287]]) (duration: 04m 39s)
* 18:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:53 mbsantos@deploy1001: Started deploy [mobileapps/deploy@11d9d4a]: Update service-mobileapp-node to {{Gerrit|200a323}} ([[phab:T228938|T228938]] [[phab:T228287|T228287]])
* 18:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:51 elukey: powercycle stat1007
* 17:59 ejegg: turned fundraising scheduled jobs back on
* 17:44 volans: sudo cumin -s30 -b1 -m async 'A:wdqs-all and not A:wdqs-internal and not P{wdqs1009.eqiad.wmnet}' 'run-puppet-agent -e "volans - [[phab:T228122|T228122]] - deploying gerrit/524954"' 'systemctl restart wdqs-blazegraph'
* 17:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:33 volans: running sudo cumin -s30 -b1 -m async 'A:wdqs-internal' 'run-puppet-agent -e "volans - [[phab:T228122|T228122]] - deploying gerrit/524954"' 'systemctl restart wdqs-blazegraph'
* 17:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]] (duration: 06m 58s)
* 17:18 volans: disabled puppet on A:wdqs-all, deploying gerrit/524954 - [[phab:T228122|T228122]]
* 17:55 moritzm: installing libsndfile security updates
* 17:17 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.rolling-restart-workers (exit_code=0)
* 17:50 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 17:01 elukey@cumin1001: START - Cookbook sre.hadoop.rolling-restart-workers
* 17:50 urbanecm@deploy1002: Started scap: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]]
* 16:54 bblack: lvs5001 - restart pybal for resolv.conf change - [[phab:T228190|T228190]]
* 17:49 ejegg: turned off fundraising scheduled jobs for civi deploy
* 16:53 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/WikibaseMediaInfo/resources/statements/: [[phab:T228807|T228807]] Fix formatValue abort handling (duration: 00m 48s)
* 17:28 tzatziki: removing 4 files for legal compliance
* 16:52 jijiki: Rolling restart of hhvm across the fleet
* 17:04 mutante: gerrit - deployed 832345 - scap and daemon users became decoupled ([[phab:T317412|T317412]])
* 16:50 bblack: lvs5002 - restart pybal for resolv.conf change - [[phab:T228190|T228190]]
* 17:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:44 bblack: lvs5003 - restart pybal for resolv.conf change - [[phab:T228190|T228190]]
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:19 jijiki: Disable puppet on mw* servers for 525156
* 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:52 jeh: rebooting cloudstore1008.wikimedia.org for updates [[phab:T224228|T224228]]
* 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:41 jeh: rebooting cloudstore1009.wikimedia.org for updates [[phab:T224228|T224228]]
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:41 nuria@deploy1001: Finished deploy [analytics/refinery@f310917]: deploying refinery - migrations to hive2 actions (duration: 13m 40s)
* 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:35 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 16:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:35 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:32 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove redundant wgResourceLoaderStorageEnabled override (duration: 00m 50s)
* 16:25 brennen@deploy1002: Pruned MediaWiki: 1.40.0-wmf.2 (duration: 02m 02s)
* 15:27 nuria@deploy1001: Started deploy [analytics/refinery@f310917]: deploying refinery - migrations to hive2 actions
* 16:24 brennen@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]] (duration: 28m 55s)
* 15:09 jeh: rebooting labstore1004.eqiad.wmnet for updates [[phab:T224228|T224228]]
* 16:21 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns4003.wikimedia.org with OS bullseye
* 14:42 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@87b25f2]: Convert oozie actions from hive to hive2 (duration: 00m 19s)
* 16:03 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 14:42 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@87b25f2]: Convert oozie actions from hive to hive2
* 16:00 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 15:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS buster
* 14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 15:54 brennen@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 14:02 moritzm: installing Java security updates on Druid servers
* 15:53 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 13:52 moritzm: installing Java security updates on AQS, Hadoop and Kafka/Jumbo servers
* 15:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 15:51 brennen: restarting `/usr/bin/scap stage-train --yes auto` after failed staging ([[phab:T314193|T314193]]), cc: ^demon
* 13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 15:48 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:47 sukhe: disable Puppet on A:cp and A:eqiad for [[phab:T309651|T309651]]
* 13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 15:42 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 15:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
* 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
* 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS buster
* 13:38 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:35 robh: cloudvirt1015 offline for ram swap via [[phab:T220853|T220853]]
* 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 fsero: recreating clusterrole deploy from helmfile in staging
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 marostegui: Drop abuse_filter_log.afl_log_id in s5 eqiad - [[phab:T226851|T226851]]
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.15
* 15:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
* 12:49 marostegui: Drop abuse_filter_log.afl_log_id in s4 codfw (lag will appear on codfw) - [[phab:T226851|T226851]]
* 15:10 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 11:53 marostegui: Compress s3 wikis on labsdb1010 - [[phab:T222978|T222978]]
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2002.codfw.wmnet with OS buster
* 11:03 arturo: update stretch-wikimedia/thirdparty/kubeadm-k8s on install1002 for [[phab:T215531|T215531]] (kubeadm 1.15.1)
* 15:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 10:53 moritzm: rebooting cloudvirt2003-dev
* 15:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 10:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:06 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 10:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:35 moritzm: rebooting cloudvirt1024 for kernel update
* 15:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:21 marostegui: Failover m1 from dbproxy1006 to dbproxy1001 - [[phab:T227139|T227139]]
* 15:02 moritzm: installing snakeyaml security updates
* 08:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 08:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:55 papaul: maintenance complete on msw1-codfw
* 08:54 moritzm: rebooting cloudvirt2001-dev
* 14:51 sukhe: disable Puppet on A:cp and A:esams for [[phab:T309651|T309651]]
* 08:32 Urbanecm: Password reset for SUL user Strejc
* 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
* 08:04 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad,name=mw128[0-3].*
* 14:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
* 08:01 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad,name=mw12(6[89]{{!}}7[0-5]).*
* 14:40 moritzm: installing maven-shared-utils security updates
* 08:01 _joe_: repooling mw1268-1275 in the appserver cluster
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2002.codfw.wmnet with OS buster
* 08:00 moritzm: rebooting cloudvirt2001-dev
* 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
* 07:59 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad,name=mw12(7[6-9{{!}}8[0-3]).*
* 14:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
* 07:59 _joe_: repooling mw1276-1283 in the API cluster
* 14:30 papaul: on going maintenance on msw1-codfw
* 07:33 moritzm: rebooting cloudvirt2001-dev
* 14:29 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 07:23 marostegui: Upgrade MySQL on db1072
* 14:27 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 07:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:22 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 07:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:14 XioNoX: netbox - Move VRRP IPs to FHRP group feature - [[phab:T311218|T311218]]
* 06:42 elukey: restart kafka* on kafka-jumbo1001 to pick up new openjdk-8 version
* 14:13 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 06:37 elukey: restart cassandra instances on aqs1004 to pick up new openjdk-8 version
* 14:12 filippo@cumin1001: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 06:34 elukey: add term eventgate to analytics-in4 on cr1/cr2-eqiad - [[phab:T228882|T228882]]
* 14:12 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/tests/phpunit/: Backport: [[gerrit:838094{{!}}Revert "Introduce LanguageVariantConverter" (T319282)]] (2/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 52s)
* 05:31 twentyafterfour: set phabricator to read-write mode
* 14:12 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 05:30 marostegui: Failover m3 from db1072 to db1128 - [[phab:T228243|T228243]]
* 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/includes/: Backport: [[gerrit:838094{{!}}Revert "Introduce LanguageVariantConverter" (T319282)]] (1/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 43s)
* 05:30 twentyafterfour: phabricator set to read-only mode
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:51 marostegui: Start pre-failover steps on m3 [[phab:T228243|T228243]]
* 14:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:02 XioNoX: remove peer AS63541 from cr1-eqsin
* 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/extensions/Kartographer/modules/dialog: Backport: [[gerrit:838097{{!}}Log basic nearby and fullscreen events (T315972, T318678)]] (no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 42s)
* 14:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:55 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 13:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 13:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 13:49 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35347 and previous config saved to /var/cache/conftool/dbconfig/20221004-134947-root.json
* 13:49 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 13:48 sukhe: disable Puppet on A:cp and A:eqsin for [[phab:T309651|T309651]]
* 13:47 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:42 awight: EU backport window finished.
* 13:40 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 13:38 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:36 awight@deploy1002: Finished scap: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]] (duration: 06m 49s)
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
* 13:35 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "filippo test - filippo@cumin1001"
* 13:34 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "filippo test - filippo@cumin1001"
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35346 and previous config saved to /var/cache/conftool/dbconfig/20221004-133442-root.json
* 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
* 13:31 jbond: re-enable puppet post deploy a puppetmaster change 838144
* 13:30 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
* 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
* 13:30 awight@deploy1002: awight and awight: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:29 awight@deploy1002: Started scap: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]]
* 13:28 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
* 13:27 awight@deploy1002: Finished scap: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]] (duration: 05m 16s)
* 13:24 jbond: disable puppet to deploy a puppetmaster change 838144
* 13:22 awight@deploy1002: awight and stang: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:21 awight@deploy1002: Started scap: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]]
* 13:21 awight@deploy1002: Finished scap: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]] (duration: 12m 48s)
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35345 and previous config saved to /var/cache/conftool/dbconfig/20221004-131937-root.json
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 awight@deploy1002: awight and stang: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:08 awight@deploy1002: Started scap: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]]
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35343 and previous config saved to /var/cache/conftool/dbconfig/20221004-130432-root.json
* 12:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35342 and previous config saved to /var/cache/conftool/dbconfig/20221004-124927-root.json
* 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35341 and previous config saved to /var/cache/conftool/dbconfig/20221004-123422-root.json
* 12:31 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]] (duration: 00m 58s)
* 12:30 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]]
* 12:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 12:26 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]] (duration: 00m 14s)
* 12:26 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]]
* 12:21 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35340 and previous config saved to /var/cache/conftool/dbconfig/20221004-121917-root.json
* 12:14 volans: uploaded python3-gjson_0.1.0 to apt.wikimedia.org bullseye-wikimedia
* 12:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 12:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 12:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 12:08 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host sessionstore2001.codfw.wmnet with OS buster
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35339 and previous config saved to /var/cache/conftool/dbconfig/20221004-120413-root.json
* 11:55 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 11:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
* 11:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
* 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 11:22 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 11:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 11:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 11:05 jayme: published calico 3.23.3 debian packages in bullseye component/calico323 as well as corresponding docker images - [[phab:T307943|T307943]]
* 11:04 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:58 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
* 10:55 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:54 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS buster
* 10:53 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 10:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 135158
* 10:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 135158
* 10:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9119
* 10:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9119
* 10:41 moritzm: installing expat security updates
* 09:59 jmm@cumin2002: END (FAIL) - Cookbook sre.maps.roll-restart (exit_code=1) rolling restart_daemons on A:maps-codfw
* 09:47 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 09:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 09:44 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 09:42 jayme: deployed istio-ingressgateway with additional envoy native metrics to wikikube codfw and eqiad
* 09:40 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
* 09:37 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-codfw
* 09:36 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
* 09:36 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
* 09:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 20 hosts
* 09:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 20 hosts
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35338 and previous config saved to /var/cache/conftool/dbconfig/20221004-093530-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35337 and previous config saved to /var/cache/conftool/dbconfig/20221004-092025-root.json
* 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35336 and previous config saved to /var/cache/conftool/dbconfig/20221004-090520-root.json
* 08:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: php7.2 removal
* 08:55 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: php7.2 removal
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35335 and previous config saved to /var/cache/conftool/dbconfig/20221004-085015-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35334 and previous config saved to /var/cache/conftool/dbconfig/20221004-083511-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35333 and previous config saved to /var/cache/conftool/dbconfig/20221004-082005-root.json
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35332 and previous config saved to /var/cache/conftool/dbconfig/20221004-080500-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P35331 and previous config saved to /var/cache/conftool/dbconfig/20221004-080338-root.json
* 07:52 moritzm: installing libdatetime-timezone-perl updates (catching up with latest timezone changes)
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35330 and previous config saved to /var/cache/conftool/dbconfig/20221004-074955-root.json
* 07:36 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
* 07:36 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35329 and previous config saved to /var/cache/conftool/dbconfig/20221004-072158-root.json
* 07:16 elukey: restart kafka on kafka-logging1001 to pick up its new PKI TLS cert
* 07:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
* 07:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35328 and previous config saved to /var/cache/conftool/dbconfig/20221004-070653-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35327 and previous config saved to /var/cache/conftool/dbconfig/20221004-065148-root.json
* 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35326 and previous config saved to /var/cache/conftool/dbconfig/20221004-063643-root.json
* 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 25885
* 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 25885
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35325 and previous config saved to /var/cache/conftool/dbconfig/20221004-062138-root.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35324 and previous config saved to /var/cache/conftool/dbconfig/20221004-060633-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35323 and previous config saved to /var/cache/conftool/dbconfig/20221004-055128-root.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35322 and previous config saved to /var/cache/conftool/dbconfig/20221004-053623-root.json
* 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2019-07-24 ==
== 2022-10-03 ==
* 23:46 nuria@deploy1001: Finished deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2 (duration: 13m 34s)
* 21:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:43 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Flow: Fix JS error when saving Flow board descriptions ([[phab:T228818|T228818]]) (duration: 01m 01s)
* 21:44 robh@cumin2002: START - Cookbook sre.dns.netbox
* 23:42 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: Fix JS error when saving Flow board descriptions ([[phab:T228818|T228818]]) (duration: 01m 03s)
* 21:44 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 23:39 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable homepage for 50% of new users on arwiki ([[phab:T228120|T228120]]) (duration: 00m 58s)
* 21:18 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 23:32 nuria@deploy1001: Started deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2
* 19:41 ryankemper: [Elastic] Unbanned `elastic1066`
* 23:30 nuria@deploy1001: Finished deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues) (duration: 18m 10s)
* 19:37 ryankemper: [Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running
* 23:22 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage on arwiki ([[phab:T228120|T228120]]) (duration: 00m 55s)
* 19:32 robh: msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work
* 23:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Correct typo in arwiki help panel config ([[phab:T228820|T228820]]) (duration: 00m 57s)
* 19:25 robh: msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet
* 23:12 nuria@deploy1001: Started deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues)
* 19:22 ryankemper: [Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": "","_name": "elastic1066-production-search-psi-eqiad"}'`); will restart elasticsearch-psi after shards drain}}
* 22:41 thcipriani@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 19:15 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 22:36 thcipriani@: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 18:48 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 22:28 thcipriani@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 18:41 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 21:22 mutante: <+icinga-wm> RECOVERY - Device not healthy -SMART- on restbase-dev1006 is OK: All metrics within thresholds. ([[phab:T224260|T224260]])
* 18:34 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 21:18 cscott@deploy1001: Finished deploy [parsoid/deploy@abd05ab]: Updating Parsoid to {{Gerrit|df1af404}} ([[phab:T227216|T227216]], [[phab:T226523|T226523]], [[phab:T226451|T226451]]) (duration: 18m 35s)
* 18:30 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 21:16 nuria@deploy1001: Finished deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95 (duration: 03m 54s)
* 18:30 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster
* 21:12 nuria@deploy1001: Started deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95
* 18:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 21:03 ppchelko@deploy1001: Finished deploy [restbase/deploy@7911f65]: Store PCS endpoints [[phab:T222384|T222384]] (duration: 18m 18s)
* 18:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 21:00 cscott@deploy1001: Started deploy [parsoid/deploy@abd05ab]: Updating Parsoid to {{Gerrit|df1af404}} ([[phab:T227216|T227216]], [[phab:T226523|T226523]], [[phab:T226451|T226451]])
* 18:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 20:45 ppchelko@deploy1001: Started deploy [restbase/deploy@7911f65]: Store PCS endpoints [[phab:T222384|T222384]]
* 18:04 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 20:39 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to {{Gerrit|1751a2e}} (duration: 04m 20s)
* 18:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 20:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints [[phab:T222384|T222384]] (duration: 01m 34s)
* 17:52 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 20:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints [[phab:T222384|T222384]]
* 17:42 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 20:35 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to {{Gerrit|1751a2e}}
* 17:41 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4003
* 20:12 jeh: redirecting dumps.wikimedia.org back to labstore1007.wikimedia.org [[phab:T224228|T224228]]
* 17:41 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns4003
* 19:43 ejegg: updated fundraising CiviCRM from {{Gerrit|875ab97742}} to {{Gerrit|121feb5d53}}
* 17:40 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:08 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SecureLinkFixer on group0 wikis - [[phab:T200751|T200751]] (duration: 00m 55s)
* 17:37 robh@cumin2002: START - Cookbook sre.dns.netbox
* 18:33 cmjohnson1: moving cloudvirt107 to 10G rack [[phab:T228691|T228691]]
* 17:29 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
* 18:19 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/includes/cache/localisation/LocalisationCache.php: {{Gerrit|31d99eb381bc}} (duration: 00m 54s)
* 17:29 sukhe: running homer "cr*-ulsfo*" commit "Gerrit 837727: remove dns4001 for anycast neighbors."
* 18:15 ejegg: updated payments-wiki from {{Gerrit|a28ad541ed}} to {{Gerrit|70b432d309}}
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns4001.wikimedia.org
* 18:13 urandom: creating new restbase keyspaces -- [[phab:T228804|T228804]]
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:12 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34.0-wmf.15
* 17:08 robh@cumin2002: START - Cookbook sre.dns.netbox
* 17:14 XioNoX: rollback failover master VIP of ae2.1202 inet6 away from cr1-eqiad - [[phab:T226782|T226782]]
* 17:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns4001.wikimedia.org
* 17:10 XioNoX: Add mr1-codfw<->cr1/2-codfw vlan/link config on asw-a-codfw - [[phab:T228112|T228112]]
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:44 jijiki: Rolling puppet-enable and apache reload of jobrunners in codfw
* 16:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:12 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
* 16:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:12 bblack: re-pooling recdns on dns1001 via confctl - [[phab:T226782|T226782]]
* 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:11 bblack: lvs1014 - restore puppet and resolv.conf contents, restart pybal
* 16:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30781
* 16:10 bblack: dns1001 - restart recursor and re-enable puppet - [[phab:T226782|T226782]]
* 16:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 30781
* 16:07 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
* 16:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
* 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:59 bblack: dns1001 - puppet disable, stop recursor service to kill anycast advert - [[phab:T226782|T226782]]
* 16:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:59 bblack: lvs1014 - puppet disable, remove dns1001 from resolv.conf, restart pybal - [[phab:T226782|T226782]]
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:58 XioNoX: failover master VIP of ae2.1202 inet6 away from cr1-eqiad - [[phab:T226782|T226782]]
* 16:24 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] (duration: 04m 16s)
* 15:56 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:56 bblack: depooling recdns on dns1001 via confctl - [[phab:T226782|T226782]]
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:56 bblack: depooling recdns on dns1001 via confctl
* 16:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:47 jijiki: Rolling puppet-enable and apache reload of jobrunners in eqiad
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:44 jeh: rebooting labstore1007.wikimedia.org for updates [[phab:T224228|T224228]]
* 16:20 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 15:42 jijiki: Disable puppet on jobrunners for 525306
* 16:20 urbanecm@deploy1002: Started scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]]
* 15:11 herron: resume ingesting [message] =~ /^SlowTimer/ logs on logstash1007 (as a canary)
* 16:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cae49b85d2d780e34b553789d56d76bac4a62c48}}: throttle: Add throttle rule for 2022-10-06 ([[phab:T319212|T319212]]) (duration: 04m 21s)
* 15:02 XioNoX: re-enable vc link between asw2-a6 and asw2-a7 - [[phab:T228823|T228823]]
* 16:14 sukhe: disable Puppet on cp hosts in codfw: rolling out [[phab:T309651|T309651]]
* 14:58 jeh: unmounting dumps NFS clients from labstore1007.wikimedia.org [[phab:T224228|T224228]]
* 15:15 sukhe: disable Puppet on cp hosts in ulsfo: rolling out [[phab:T309651|T309651]]
* 14:54 XioNoX: cleared vc ports stats on asw2-a-eqiad - [[phab:T228823|T228823]]
* 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35320 and previous config saved to /var/cache/conftool/dbconfig/20221003-151438-root.json
* 14:43 marostegui: Drop abuse_filter_log.afl_log_id in s5 eqiad - [[phab:T226851|T226851]]
* 15:06 papaul: maintenance complete on mr1-esams
* 14:40 marostegui: Drop abuse_filter_log.afl_log_id in s5 codfw (lag will appear on codfw) - [[phab:T226851|T226851]]
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35319 and previous config saved to /var/cache/conftool/dbconfig/20221003-145933-root.json
* 14:31 tarrow@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35318 and previous config saved to /var/cache/conftool/dbconfig/20221003-144428-root.json
* 13:49 robh: rebooting cloudvirt1015 into OS, memory error confirmed.  new memory replacement dispatch entered via [[phab:T220853|T220853]]
* 14:35 sukhe: upgrade A:cp and A:drmrs to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 13:31 marostegui: Drop abuse_filter_log.afl_log_id in s2 eqiad - [[phab:T226851|T226851]]
* 14:31 papaul: on going maintenance on mr1-esams
* 13:25 robh: rebooting cloudvirt1015 into memtest for dell support repair via [[phab:T220853|T220853]]
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35317 and previous config saved to /var/cache/conftool/dbconfig/20221003-142923-root.json
* 13:06 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.15 (duration: 00m 54s)
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35316 and previous config saved to /var/cache/conftool/dbconfig/20221003-141417-root.json
* 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.15
* 14:08 sukhe: upgrade cp4026, cp4032 to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 12:19 marostegui: Stop haproxy on dbproxy1004 and dbproxy1009 (m4 - eventlogging) - [[phab:T228768|T228768]]
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35315 and previous config saved to /var/cache/conftool/dbconfig/20221003-135912-root.json
* 11:23 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: [[gerrit:525254{{!}}Disable FileImporter source wiki edits (T228851)]] (duration: 00m 54s)
* 13:57 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm2_amd64.changes: [[phab:T309651|T309651]]
* 11:12 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:514672{{!}}Remove Content Translation event logging config]] (part 2/2) (duration: 00m 54s)
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35314 and previous config saved to /var/cache/conftool/dbconfig/20221003-134407-root.json
* 11:10 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: [[gerrit:514672{{!}}Remove Content Translation event logging config]] (part 1/2) (duration: 00m 59s)
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35313 and previous config saved to /var/cache/conftool/dbconfig/20221003-134024-root.json
* 10:04 marostegui: Drop abuse_filter_log.afl_log_id from labswiki (wikitech) and labtestwiki - [[phab:T226851|T226851]]
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35312 and previous config saved to /var/cache/conftool/dbconfig/20221003-132902-root.json
* 09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1082 (duration: 00m 55s)
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35311 and previous config saved to /var/cache/conftool/dbconfig/20221003-132519-root.json
* 08:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 into API after upgrade (duration: 00m 55s)
* 13:18 vgutierrez: enforcing origin-form{{!}}asterisk-form for request-target on varnish (could trigger spikes of HTTP 400 errors) - [[phab:T318676|T318676]]
* 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1082 after upgrade (duration: 00m 54s)
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35310 and previous config saved to /var/cache/conftool/dbconfig/20221003-131014-root.json
* 08:40 marostegui: Stop MySQL on db1082 for upgrade
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35308 and previous config saved to /var/cache/conftool/dbconfig/20221003-125509-root.json
* 08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for upgrade (duration: 00m 57s)
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35307 and previous config saved to /var/cache/conftool/dbconfig/20221003-124004-root.json
* 08:35 marostegui: Drop abuse_filter_log.afl_log_id in s2 codfw (lag will appear on codfw) - [[phab:T226851|T226851]]
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35306 and previous config saved to /var/cache/conftool/dbconfig/20221003-122459-root.json
* 07:58 marostegui: Drop abuse_filter_log.afl_log_id  from wikidata in eqiad - [[phab:T226851|T226851]]
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35305 and previous config saved to /var/cache/conftool/dbconfig/20221003-120954-root.json
* 07:21 marostegui: Stop MySQL on db1117:3322 to check dbproxy1013 notifications - [[phab:T202367|T202367]]
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P35303 and previous config saved to /var/cache/conftool/dbconfig/20221003-120208-root.json
* 07:10 marostegui: Deploy grants for dbproxy1013 in m2 - [[phab:T202367|T202367]]
* 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 05:00 marostegui: Stop puppet on dbprov2001 to generate s5 mysqldump manually
* 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 04:52 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 04:51 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 04:50 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 53s)
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35302 and previous config saved to /var/cache/conftool/dbconfig/20221003-115449-root.json
* 04:49 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 04:45 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 04:42 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: [[phab:T227700|T227700]] (duration: 00m 54s)
* 11:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 04:40 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: (no justification provided) (duration: 00m 56s)
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1003.eqiad.wmnet with OS buster
* 03:41 tstarling@deploy1001: Synchronized w/fatal-error.php: Adding post-send exception test for [[phab:T228462|T228462]] (duration: 00m 54s)
* 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 03:39 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adding DeferredUpdates log channel ([[phab:T228462|T228462]]) (duration: 00m 56s)
* 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 02:01 eileen: payments-wiki revision changed from {{Gerrit|224c6b2d7b}} to {{Gerrit|a28ad541ed}}, config revision is {{Gerrit|8dcb77cf22}}
* 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 11:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 10:52 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1003.eqiad.wmnet with OS buster
* 10:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 10:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1002.eqiad.wmnet with OS buster
* 10:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 10:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 10:39 hnowlan: starting cassandra on reimaged sessionstore1002
* 10:37 _joe_: remove stale druid.svc.eqiad.wmnet certificate from the puppetmaster CA; it was expired anyways
* 10:32 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 10:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 10:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 10:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 10:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 10:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1002.eqiad.wmnet with OS buster
* 10:00 hnowlan: c-foreach-nt drain on sessionstore1002
* 10:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 10:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35300 and previous config saved to /var/cache/conftool/dbconfig/20221003-092519-root.json
* 09:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31133
* 09:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31133
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62044
* 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62044
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35299 and previous config saved to /var/cache/conftool/dbconfig/20221003-091014-root.json
* 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P35297 and previous config saved to /var/cache/conftool/dbconfig/20221003-085840-root.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35296 and previous config saved to /var/cache/conftool/dbconfig/20221003-085509-root.json
* 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12975
* 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12975
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35295 and previous config saved to /var/cache/conftool/dbconfig/20221003-085007-root.json
* 08:40 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5001.eqsin.wmnet
* 08:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35294 and previous config saved to /var/cache/conftool/dbconfig/20221003-084004-root.json
* 08:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3303
* 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35293 and previous config saved to /var/cache/conftool/dbconfig/20221003-083729-root.json
* 08:36 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
* 08:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35292 and previous config saved to /var/cache/conftool/dbconfig/20221003-083502-root.json
* 08:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5001.eqsin.wmnet
* 08:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15557
* 08:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15557
* 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12975
* 08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12975
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35291 and previous config saved to /var/cache/conftool/dbconfig/20221003-082459-root.json
* 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30781
* 08:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30781
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35290 and previous config saved to /var/cache/conftool/dbconfig/20221003-082224-root.json
* 08:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39386
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35289 and previous config saved to /var/cache/conftool/dbconfig/20221003-081955-root.json
* 08:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39386
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35288 and previous config saved to /var/cache/conftool/dbconfig/20221003-080954-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35287 and previous config saved to /var/cache/conftool/dbconfig/20221003-080719-root.json
* 08:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35286 and previous config saved to /var/cache/conftool/dbconfig/20221003-080556-root.json
* 08:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35285 and previous config saved to /var/cache/conftool/dbconfig/20221003-080451-root.json
* 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2178', diff saved to https://phabricator.wikimedia.org/P35284 and previous config saved to /var/cache/conftool/dbconfig/20221003-075643-root.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35283 and previous config saved to /var/cache/conftool/dbconfig/20221003-075449-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35282 and previous config saved to /var/cache/conftool/dbconfig/20221003-075214-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35281 and previous config saved to /var/cache/conftool/dbconfig/20221003-075051-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35280 and previous config saved to /var/cache/conftool/dbconfig/20221003-074946-root.json
* 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16637
* 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16637
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35279 and previous config saved to /var/cache/conftool/dbconfig/20221003-073944-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35278 and previous config saved to /var/cache/conftool/dbconfig/20221003-073709-root.json
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 XioNoX: cr2-drmrs# set chassis fpc 0 sampling-instance pmacct
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35277 and previous config saved to /var/cache/conftool/dbconfig/20221003-073627-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1200', diff saved to https://phabricator.wikimedia.org/P35276 and previous config saved to /var/cache/conftool/dbconfig/20221003-073556-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35275 and previous config saved to /var/cache/conftool/dbconfig/20221003-073546-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35274 and previous config saved to /var/cache/conftool/dbconfig/20221003-073441-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35273 and previous config saved to /var/cache/conftool/dbconfig/20221003-072741-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35272 and previous config saved to /var/cache/conftool/dbconfig/20221003-072204-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35271 and previous config saved to /var/cache/conftool/dbconfig/20221003-072122-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35270 and previous config saved to /var/cache/conftool/dbconfig/20221003-072041-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35269 and previous config saved to /var/cache/conftool/dbconfig/20221003-071936-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35268 and previous config saved to /var/cache/conftool/dbconfig/20221003-071236-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35267 and previous config saved to /var/cache/conftool/dbconfig/20221003-070659-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35266 and previous config saved to /var/cache/conftool/dbconfig/20221003-070617-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35265 and previous config saved to /var/cache/conftool/dbconfig/20221003-070536-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35264 and previous config saved to /var/cache/conftool/dbconfig/20221003-070431-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2175', diff saved to https://phabricator.wikimedia.org/P35263 and previous config saved to /var/cache/conftool/dbconfig/20221003-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35262 and previous config saved to /var/cache/conftool/dbconfig/20221003-065731-root.json
* 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6128
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35261 and previous config saved to /var/cache/conftool/dbconfig/20221003-065154-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35260 and previous config saved to /var/cache/conftool/dbconfig/20221003-065112-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35259 and previous config saved to /var/cache/conftool/dbconfig/20221003-065031-root.json
* 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6128
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P35258 and previous config saved to /var/cache/conftool/dbconfig/20221003-064638-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35257 and previous config saved to /var/cache/conftool/dbconfig/20221003-064226-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35256 and previous config saved to /var/cache/conftool/dbconfig/20221003-063607-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35255 and previous config saved to /var/cache/conftool/dbconfig/20221003-063527-root.json
* 06:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11039
* 06:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11039
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35254 and previous config saved to /var/cache/conftool/dbconfig/20221003-062721-root.json
* 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5400
* 06:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5400
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35253 and previous config saved to /var/cache/conftool/dbconfig/20221003-062102-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35252 and previous config saved to /var/cache/conftool/dbconfig/20221003-062022-root.json
* 06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3300
* 06:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3300
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35251 and previous config saved to /var/cache/conftool/dbconfig/20221003-061216-root.json
* 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35250 and previous config saved to /var/cache/conftool/dbconfig/20221003-060557-root.json
* 06:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35249 and previous config saved to /var/cache/conftool/dbconfig/20221003-055711-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P35248 and previous config saved to /var/cache/conftool/dbconfig/20221003-055401-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35247 and previous config saved to /var/cache/conftool/dbconfig/20221003-055052-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P35246 and previous config saved to /var/cache/conftool/dbconfig/20221003-054245-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35245 and previous config saved to /var/cache/conftool/dbconfig/20221003-054206-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P35244 and previous config saved to /var/cache/conftool/dbconfig/20221003-052927-root.json


== 2019-07-23 ==
== 2022-10-02 ==
* 23:44 eileen: civicrm revision changed from {{Gerrit|88e9f24893}} to {{Gerrit|875ab97742}}, config revision is {{Gerrit|4006d3bdc5}}
* 08:13 elukey: `apt-get clean` on an-airflow1001 to free some space on the root partition
* 23:43 shdubsh: reverting logstash mitigations and re-enable puppet
* 23:42 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/diff/DifferenceEngine.php: [[phab:T228766|T228766]] Don't double wrap rollback links (duration: 00m 56s)
* 23:31 mutante: mw1267 - rm -rf /srv/mediawiki/php-1.33.0-wmf.23 ; rm -rf /srv/mediawiki/php-1.32.0-wmf.3 ; scap pull
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
* 22:36 mutante: rolling out scap 3.11.1-1 on mw-eqiad servers
* 22:14 mutante: continuing rollout of new scap version 3.11.1-1, starting with kafka-all followed by other cumin-alias groups ([[phab:T228328|T228328]])
* 22:06 herron: puppet temporarily disabled on eqiad/codfw logstash collectors while catching up with backlog. see /etc/logstash/conf.d/01-filter_temp_drops.conf
* 21:52 herron: logstash - temporarily dropping logs matching [message] =~ /^SlowTimer/ due to UTF-8 parsing errors that are stopping the logstash processing pipeline.  will re-enable after logstash has caught up with the backlog
* 20:59 shdubsh: temporarily disable input-kafka-rsyslog-shipper and drop memcached logs on logstash nodes
* 20:08 paravoid: asw2-a-eqiad: request virtual-chassis vc-port set interface member 6 vcp-255/1/0 disable
* 19:58 eileen: process-control config revision is {{Gerrit|4006d3bdc5}} - disabled  drush fill donor totals job
* 19:49 mutante: mwdebug1002 - restarting hhvm - mw1312 - restarted apache
* 19:44 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 and 1004
* 19:40 mutante: restarting hhvm on mw1312
* 19:28 cdanis: depool all appservers in eqiad A7 cdanis@cumin1001.eqiad.wmnet ~ 🍵 sudo cumin 'mw12[67-83]*' 'depool'
* 19:11 bblack: repool lvs1013 - [[phab:T227143|T227143]]
* 19:10 bblack: repool cp1077 + cp1078 - [[phab:T227143|T227143]]
* 19:09 elukey: depool mw1261 for investigation
* 19:06 herron: restarting logstash on logstash100[789]
* 18:53 robh: mw1271 had power loss event due to pdu swap via [[phab:T227143|T227143]]
* 18:45 mutante: rolling out scap 3.11.1-1 on all mw codfw servers ([[phab:T228328|T228328]])
* 18:43 mutante: rolling out scap 3.11.1-1 on mw canary servers ([[phab:T228328|T228328]])
* 18:13 robh: started depooling servers in a7-eqiad for pdu work via [[phab:T227143|T227143]]
* 18:11 cdanis: depool mw1267
* 18:10 cdanis: cdanis@mw1267.eqiad.wmnet /srv/mediawiki ☕ scap pull
* 18:09 cdanis: cdanis@mw1267.eqiad.wmnet ~ ☕ sudo apt install python-concurrent.futures
* 18:08 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] Make XmlDumpwriter resilient to blob store corruption (duration: 00m 54s)
* 18:07 James_F: Belay that, error on mw1267.
* 18:06 James_F: Sync error on mw1314.eqiad.wmnet, No module named concurrent.futures
* 18:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: [[phab:T228720|T228720]] Make XmlDumpwriter resilient to blob store corruption (duration: 00m 57s)
* 18:05 bblack: lvs1013 - disable puppet and stop pybal - [[phab:T227143|T227143]]
* 18:04 bblack: depool cp1077 + cp1088 - [[phab:T227143|T227143]]
* 18:03 cdanis@deploy1001: Synchronized docroot/noc/db.php: {{Gerrit|8def4af1d}} noc db.php: include readonly status & group loads (duration: 00m 55s)
* 17:52 moritzm: installing Java security updates on kafka/main and Logstash servers
* 17:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema [[phab:T226522|T226522]], step 2 (duration: 01m 37s)
* 17:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema [[phab:T226522|T226522]], step 2
* 17:00 ppchelko@deploy1001: Finished deploy [changeprop/deploy@894f735]: Switch internal events to the new schema [[phab:T226522|T226522]] (duration: 01m 30s)
* 16:58 ppchelko@deploy1001: Started deploy [changeprop/deploy@894f735]: Switch internal events to the new schema [[phab:T226522|T226522]]
* 16:22 godog: pool prometheus1003 - [[phab:T227139|T227139]]
* 15:46 robh: side b of a5-eqiad swapping pdu via [[phab:T227141|T227141]]
* 15:14 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 15:08 _joe_: uninstalling php-pear, php-mail, php-mail-mime from mw1267 [[phab:T195364|T195364]]
* 14:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]], attempt 2 (duration: 13m 08s)
* 14:39 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]], attempt 2
* 14:14 robh: a3-eqiad pdu swap taking place now via [[phab:T227139|T227139]]
* 13:47 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 13:45 godog: depool restbase1016 restbase1019 restbase1011 restbase1010 prometheus1003 ahead of PDU work - [[phab:T227139|T227139]]
* 13:45 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 13:44 moritzm: installing Java security updates on furud/flerovium
* 13:43 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 13:27 jeh: dumps switching active vps to labstore1006 [[phab:T224228|T224228]]
* 13:17 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:06 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
* 13:06 marostegui: Drop abuse_filter_log.afl_log_id from s8 codfw (lag will happen on codfw s8) - [[phab:T226851|T226851]]
* 12:33 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (duration: 29m 46s)
* 12:04 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache
* 12:02 akosiaris: drain kubernetes1001. [[phab:T227139|T227139]]
* 12:01 akosiaris: empty ganeti1007 from running instances. [[phab:T227139|T227139]]
* 11:59 akosiaris: enable disable poolcounter1003, switchover codfw poolcounters [[phab:T224572|T224572]]
* 11:58 tarrow: EU SWAT finished
* 11:58 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 46s)
* 11:56 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:525065{{!}}T214902 Fix missing /termbox in SSRTermboxServerUrl]] (duration: 00m 44s)
* 11:54 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.10 (duration: 07m 55s)
* 11:43 jijiki: restart php-fpm on mwdebug*
* 11:25 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:525062{{!}}T214902 Enable termbox on testwikidatawiki]] (duration: 01m 37s)
* 11:08 jijiki: enable puppet on jobrunners
* 10:17 marostegui: Drop abuse_filter_log.afl_log_id from db1096:3316, db1139:3316 and dbstore1005:3316 [[phab:T226851|T226851]]
* 10:02 moritzm: installing Java security updates on notebook/stat hosts
* 09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
* 09:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:53 marostegui: Drop abuse_filter_log.afl_log_id from s6 codfw with replication (this will cause lag in s6 codfw) - [[phab:T226851|T226851]]
* 09:51 akosiaris: enable poolcounter1005, disablepoolcounter1001 [[phab:T224572|T224572]]
* 09:51 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
* 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool into API db1100 after upgrade (duration: 00m 46s)
* 09:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool into API db1100 after upgrade (duration: 00m 47s)
* 09:09 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
* 09:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1100 after upgrade (duration: 00m 46s)
* 08:34 marostegui: Upgrade db1100
* 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 for upgrade (duration: 00m 53s)
* 08:08 marostegui: Stop MySQL on db2044 to test dbproxy2002 notifications - [[phab:T202367|T202367]]
* 07:31 marostegui: Deploy grants for dbproxy2002 on m2 - [[phab:T202367|T202367]]
* 04:52 eileen: civicrm revision changed from {{Gerrit|d951b07ce3}} to {{Gerrit|88e9f24893}}, config revision is {{Gerrit|f7b7622e27}}
* 04:43 marostegui: Failover m1 from dbproxy1001 to dbproxy1006 [[phab:T227139|T227139]]
* 00:06 Urbanecm: slwiki updateCollection.php completed ([[phab:T208984|T208984]])


== 2019-07-22 ==
== 2022-10-01 ==
* 23:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 524952 Increase hewiki rollback limit for patrollers to 50/60 (duration: 00m 48s)
* 13:24 fab@deploy1002: Finished deploy [airflow-dags/research@44a1158]: (no justification provided) (duration: 00m 08s)
* 23:54 Urbanecm: Run mwscript importImages.php --wiki=commonswiki --user=Meisam /home/urbanecm/T223052
* 13:24 fab@deploy1002: Started deploy [airflow-dags/research@44a1158]: (no justification provided)
* 23:42 Urbanecm: All updateCollation.php runs completed, except the one for slwiki ([[phab:T208984|T208984]])
* 13:12 fab@deploy1002: Finished deploy [airflow-dags/research@d6b3e82]: (no justification provided) (duration: 03m 35s)
* 23:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add flood group to ptwiki ([[phab:T228521|T228521]]) (duration: 00m 47s)
* 13:08 fab@deploy1002: Started deploy [airflow-dags/research@d6b3e82]: (no justification provided)
* 23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwiktionary --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiversity --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:37 Urbanecm: Run mwscript updateCollation.php --wiki=slwikisource --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix comment in IS.php (noop, [[phab:T227000|T227000]]) (duration: 00m 46s)
* 23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiquote --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikibooks --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:33 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: [[:gerrit:524704{{!}}Fix "Remove "עמוד" namespace from wgFlaggedRevsNamespaces for hewikisource"]] ([[phab:T227000|T227000]]) (duration: 00m 47s)
* 23:29 Urbanecm: Run mwscript updateCollation.php --wiki=slwiki --previous-collation=uppercase ([[phab:T208984|T208984]])
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgCategoryCollation to uca-sl-u-kn on Slovene projects (sl) ([[phab:T208984|T208984]]) (duration: 00m 47s)
* 22:11 mutante: dropped zero.wikiMedia.org from DNS ([[phab:T187716|T187716]])
* 21:50 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for [[phab:T227416|T227416]] (duration: 00m 46s)
* 21:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate [[phab:T211248|T211248]] (duration: 13m 01s)
* 21:35 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Temporary make account creation limits more restrictive" (duration: 00m 47s)
* 21:27 eileen: civicrm revision is {{Gerrit|d951b07ce3}}, config revision is {{Gerrit|f7b7622e27}}
* 21:25 ppchelko@deploy1001: Started deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate [[phab:T211248|T211248]]
* 21:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]] (duration: 16m 14s)
* 21:21 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:20 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:19 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:17 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 21:05 eileen: civicrm revision changed from {{Gerrit|f932e56cd2}} to {{Gerrit|d951b07ce3}}, config revision is {{Gerrit|f7b7622e27}}
* 21:04 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate [[phab:T211248|T211248]]
* 20:04 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@0be6045]: Weekly deploy (duration: 18m 42s)
* 19:46 smalyshev@deploy1001: Started deploy [wdqs/wdqs@0be6045]: Weekly deploy
* 19:09 ppchelko@deploy1001: Finished deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate [[phab:T211248|T211248]] (duration: 01m 31s)
* 19:07 ppchelko@deploy1001: Started deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate [[phab:T211248|T211248]]
* 18:59 elukey: repool scb1001 after pdu maintenance
* 18:59 herron: repooling kafka1001 [[phab:T227140|T227140]]
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable help panel for 50% of new users on arwiki ([[phab:T226729|T226729]]) (duration: 00m 47s)
* 18:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Trying the last sync again, because it's appearing inconsistently (duration: 00m 47s)
* 18:15 thcipriani: restarting gerrit due to [[phab:T224448|T224448]]
* 18:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments help panel on arwiki ([[phab:T226729|T226729]]) (duration: 00m 48s)
* 18:00 elukey: arm keyholder on netmon1002 after power loss
* 17:35 elukey: depool scb1001 for PDU work [[phab:T227140|T227140]]
* 17:22 herron: depooling kafka1001 for PDU work [[phab:T227140|T227140]]
* 17:17 nuria@deploy1001: Finished deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs (duration: 14m 51s)
* 17:02 nuria@deploy1001: Started deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs
* 17:02 jijiki: enable puppet on all jobrunners
* 16:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T87899|T87899]] Use wfLoadExtension for Collection rather than deprecated entry point (duration: 00m 47s)
* 16:48 jforrester@deploy1001: Synchronized wmf-config/extension-list: Load Collection i18n via extension.json directly (duration: 00m 47s)
* 16:36 jeh: redirecting dumps.wikimedia.org  dns to labstore1006 [[phab:T224228|T224228]]
* 15:49 jijiki: Rolling depool and pool of mw1293, mw1294, mw1295, mw1296, mw1299 - [[phab:T219148|T219148]]
* 15:38 marostegui: Stop mysql and power off pc2010 for on-site maintenance - [[phab:T227552|T227552]]
* 15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Wikibase/lib/WikibaseLib.php: [[phab:T227814|T227814]] Wikibase: Define $wgMessagesDirs in WikibaseLib PHP entry point (duration: 00m 48s)
* 15:27 jijiki: Depool mw1300 and pool back
* 15:24 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: [[phab:T228614|T228614]] XmlDumpWriter: don't load revision text content unless requested to (duration: 00m 48s)
* 15:17 jijiki: Disable puppet on jobrunners to enable php7_only
* 14:55 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 14:53 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 14:44 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 14:38 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 14:30 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 14:30 ottomata: deploying refactored eventgate chart using eventgate-wikimedia image to  eventgate-* services -  [[phab:T226668|T226668]]
* 14:28 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
* 13:12 kart_: Updated cxserver to 2019-07-17-074415-production ([[phab:T227553|T227553]], [[phab:T216812|T216812]])
* 13:07 kartik@deploy1001: scap-helm cxserver finished
* 13:07 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
* 13:07 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 13:02 kartik@deploy1001: scap-helm cxserver finished
* 13:02 kartik@deploy1001: scap-helm cxserver cluster codfw completed
* 13:02 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 13:00 kartik@deploy1001: scap-helm cxserver finished
* 13:00 kartik@deploy1001: scap-helm cxserver cluster staging completed
* 12:59 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 12:58 marostegui: Stop MySQL on db1117:3321 to test dbproxy1014 (replacement for dbproxy1006) on m1 - [[phab:T202367|T202367]]
* 12:22 moritzm: installing debian-archive-keyring Stretch update (SUA 164)
* 11:20 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:524685{{!}}Enable wgNamespacesWithSubpages on main NS for kowikiversity (T228481)]] (duration: 00m 54s)
* 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: [[gerrit:523661{{!}}Enable FileImporter source wiki edit and delete, (remove labs customizations) (T225617, T226532)]] (duration: 00m 54s)
* 11:13 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:523661{{!}}Enable FileImporter source wiki edit and delete (T225617, T226532)]] (duration: 00m 56s)
* 10:55 jijiki: Enable puppet on jobrunners
* 10:27 jijiki: Depool and pool mw1300
* 10:23 jijiki: Disable puppet on jobrunners for 524336 - [[phab:T219148|T219148]]
* 10:21 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 10:20 fsero: deploy coredns in staging [[phab:T226516|T226516]]
* 09:47 elukey: failover + restart of Hadoop HDFS namenode on an-master1001 to apply GC settings - [[phab:T228620|T228620]]
* 09:40 marostegui: Deploy grants on m1 to allow connections from dbproxy1014 - [[phab:T202367|T202367]]
* 09:32 elukey: restart hadoop hdfs namenode on an-master1002 to apply new GC settings - [[phab:T228620|T228620]]
* 08:33 marostegui: Rename table enwiki.math on db2116  [[phab:T196055|T196055]]
* 07:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1134 after schema change [[phab:T226851|T226851]] (duration: 00m 51s)
* 07:54 elukey: sudo -i depool on elastic1046 - broken disk (srv partition not available) - [[phab:T228606|T228606]]
* 07:40 elukey: systemctl reset-failed restbase on restbase1007->15 (decommed nodes)
* 07:27 marostegui: Drop afl_log_id column from enwiki.abuse_filter_log on db1134 [[phab:T226851|T226851]]
* 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1134 for schema change [[phab:T226851|T226851]] (duration: 00m 56s)
* 07:17 moritzm: installing openjdk-11 security updates
* 06:47 marostegui: Stop MySQL on db2062 to test dbproxy2001 notification [[phab:T202367|T202367]]
* 06:23 elukey: restart hadoop-hdfs-namenode on an-master1002 to verify if out-of-the-ordinary GC activity
* 06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1104 from s8 API (duration: 00m 55s)
* 05:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1109 into API (duration: 00m 58s)
* 05:24 marostegui: Compress more tables on labsdb1009 - [[phab:T222978|T222978]]
* 04:48 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/extension.json: fixing UBN [[phab:T228465|T228465]] (duration: 00m 54s)
* 04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/maintenance/loadExitNodes.php: fixing UBN [[phab:T228465|T228465]] (duration: 00m 54s)
* 04:44 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/includes/TorExitNodes.php: fixing UBN [[phab:T228465|T228465]] (duration: 00m 56s)
* 04:17 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: fix UBN bug [[phab:T227772|T227772]] (duration: 00m 56s)


== 2019-07-21 ==
== 2022-09-30 ==
* 01:06 Urbanecm: Deployed patch for [[phab:T228574|T228574]]
* 23:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35243 and previous config saved to /var/cache/conftool/dbconfig/20220930-232546-ladsgroup.json
* 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35242 and previous config saved to /var/cache/conftool/dbconfig/20220930-231040-ladsgroup.json
* 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35241 and previous config saved to /var/cache/conftool/dbconfig/20220930-225534-ladsgroup.json
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35240 and previous config saved to /var/cache/conftool/dbconfig/20220930-224027-ladsgroup.json
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
* 20:54 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
* 18:30 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 18:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 18:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 17:43 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 17:24 bblack@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp4045.ulsfo.wmnet with OS bullseye
* 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35237 and previous config saved to /var/cache/conftool/dbconfig/20220930-170620-ladsgroup.json
* 17:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35236 and previous config saved to /var/cache/conftool/dbconfig/20220930-170546-ladsgroup.json
* 16:54 bblack@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35235 and previous config saved to /var/cache/conftool/dbconfig/20220930-165040-ladsgroup.json
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35234 and previous config saved to /var/cache/conftool/dbconfig/20220930-163533-ladsgroup.json
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35233 and previous config saved to /var/cache/conftool/dbconfig/20220930-162027-ladsgroup.json
* 15:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 13:51 moritzm: installing puppetdb-test2001 [[phab:T318931|T318931]]
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35232 and previous config saved to /var/cache/conftool/dbconfig/20220930-131638-root.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35231 and previous config saved to /var/cache/conftool/dbconfig/20220930-130133-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35230 and previous config saved to /var/cache/conftool/dbconfig/20220930-124628-root.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35229 and previous config saved to /var/cache/conftool/dbconfig/20220930-123123-root.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35228 and previous config saved to /var/cache/conftool/dbconfig/20220930-121618-root.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35227 and previous config saved to /var/cache/conftool/dbconfig/20220930-120113-root.json
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetdb-test2001.codfw.wmnet
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35226 and previous config saved to /var/cache/conftool/dbconfig/20220930-114605-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35225 and previous config saved to /var/cache/conftool/dbconfig/20220930-113101-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P35224 and previous config saved to /var/cache/conftool/dbconfig/20220930-112307-root.json
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetdb-test2001.codfw.wmnet on all recursors
* 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache puppetdb-test2001.codfw.wmnet on all recursors
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb-test2001.codfw.wmnet
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35223 and previous config saved to /var/cache/conftool/dbconfig/20220930-104004-ladsgroup.json
* 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35222 and previous config saved to /var/cache/conftool/dbconfig/20220930-103943-ladsgroup.json
* 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35221 and previous config saved to /var/cache/conftool/dbconfig/20220930-102436-ladsgroup.json
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35220 and previous config saved to /var/cache/conftool/dbconfig/20220930-100930-ladsgroup.json
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35219 and previous config saved to /var/cache/conftool/dbconfig/20220930-095423-ladsgroup.json
* 09:42 moritzm: installing Linux 5.10.140 updates on Bullseye hosts (released via 11.5 point release), just rollout of the package, no reboots involved
* 07:37 XioNoX: add RPKI ROAs for 185.71.138.0/24 and 2001:67c:930::/48
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
* 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52320
* 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 52320
* 07:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:18 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
* 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35218 and previous config saved to /var/cache/conftool/dbconfig/20220930-070454-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35217 and previous config saved to /var/cache/conftool/dbconfig/20220930-065844-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35216 and previous config saved to /var/cache/conftool/dbconfig/20220930-064949-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35215 and previous config saved to /var/cache/conftool/dbconfig/20220930-064339-root.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35214 and previous config saved to /var/cache/conftool/dbconfig/20220930-063444-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35213 and previous config saved to /var/cache/conftool/dbconfig/20220930-062834-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35212 and previous config saved to /var/cache/conftool/dbconfig/20220930-061939-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35211 and previous config saved to /var/cache/conftool/dbconfig/20220930-061329-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35210 and previous config saved to /var/cache/conftool/dbconfig/20220930-060434-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35209 and previous config saved to /var/cache/conftool/dbconfig/20220930-055824-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35208 and previous config saved to /var/cache/conftool/dbconfig/20220930-054929-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35207 and previous config saved to /var/cache/conftool/dbconfig/20220930-054319-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35206 and previous config saved to /var/cache/conftool/dbconfig/20220930-053424-root.json
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35204 and previous config saved to /var/cache/conftool/dbconfig/20220930-052814-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35203 and previous config saved to /var/cache/conftool/dbconfig/20220930-051919-root.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35202 and previous config saved to /var/cache/conftool/dbconfig/20220930-051309-root.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P35201 and previous config saved to /var/cache/conftool/dbconfig/20220930-051206-root.json
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P35200 and previous config saved to /var/cache/conftool/dbconfig/20220930-050533-root.json
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35199 and previous config saved to /var/cache/conftool/dbconfig/20220930-041937-ladsgroup.json
* 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35198 and previous config saved to /var/cache/conftool/dbconfig/20220930-041916-ladsgroup.json
* 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35197 and previous config saved to /var/cache/conftool/dbconfig/20220930-040409-ladsgroup.json
* 03:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35196 and previous config saved to /var/cache/conftool/dbconfig/20220930-034903-ladsgroup.json
* 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35195 and previous config saved to /var/cache/conftool/dbconfig/20220930-033356-ladsgroup.json
* 00:31 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 00:22 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye


== 2019-07-19 ==
== 2022-09-29 ==
* 22:36 mutante: phab2001 - switching apache to php-fpm and worker instead of mpm-prefork (to match phab1001) ([[phab:T190568|T190568]] [[phab:T137928|T137928]] [[phab:T190572|T190572]])
* 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
* 21:57 eileen: update process control process-control config revision is {{Gerrit|c913a5f261}}
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
* 21:34 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
* 21:25 eileen: civicrm revision changed from {{Gerrit|21d3c5a3fc}} to {{Gerrit|f932e56cd2}}, config revision is {{Gerrit|9f7eba2193}}
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
* 19:35 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
* 19:35 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 19:34 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 19:07 eevans@: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 21:43 sukhe: alert1001: restart icinga
* 19:02 eevans@: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:53 cdanis@deploy1001: Synchronized docroot/noc/db.php: noc: db.php: support ?dc=codfw, and cleanups (duration: 00m 56s)
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:44 XioNoX: change netflow target port to 2055 in eqiad
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:17 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:55 moritzm: rebooting mw2164 for a test
* 21:26 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
* 15:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
* 15:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 21:18 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:40 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 21:18 ejegg: payments-wiki upgraded from {{Gerrit|839d6dde}} to {{Gerrit|aeee9676}}
* 15:27 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 21:14 robh@cumin2002: START - Cookbook sre.dns.netbox
* 15:26 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
* 21:14 brennen: end of utc late backport and config window
* 15:22 fsero: deploy coredns in staging [[phab:T226516|T226516]]
* 21:14 brennen@deploy1002: Finished scap: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]] (duration: 08m 22s)
* 15:03 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:42 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Collection/Collection.php: {{Gerrit|90eed0fad}} / [[phab:T87899|T87899]] (duration: 00m 54s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:35 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/Collection/Collection.php: {{Gerrit|66ce154d7d734209c76a62cf}} / [[phab:T87899|T87899]] (duration: 00m 56s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:29 ariel@deploy1001: Finished deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps (duration: 00m 03s)
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:29 ariel@deploy1001: Started deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps
* 21:06 brennen@deploy1002: brennen and ebernhardson: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:28 Krinkle: krinkle@deploy1001: Untracked file found in php-1.34-wmf.13
* 21:05 brennen@deploy1002: Started scap: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]]
* 14:28 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34-wmf.13 and php-1.34-wmf.14
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:30 tarrow@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:04 moritzm: installing bzip2 security updates on jessie
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:28 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:56 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 20:59 ryankemper: [[phab:T313431|T313431]] Repooled `elastic[2073-2074,2080-2081,2083,2086].codfw.wmnet`. Codfw's all on 5 masters now and cluster is back to green.
* 10:55 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 20:58 brennen@deploy1002: Sync cancelled.
* 10:53 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 20:58 brennen@deploy1002: brennen and trainbranchbot: Backport for [[gerrit:836928{{!}}Revert "cirrus: Don't configure cloud clusters for private wikis"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 10:53 fsero: deploying calico from helmfile in staging [[phab:T227775|T227775]]
* 20:58 ryankemper: [[phab:T313431|T313431]] Updated cross-cluster seed conf with new masters; should resolve the settings check alerts
* 10:35 jijiki: enable puppet on jobrunners
* 20:58 brennen@deploy1002: Started scap: Backport for [[gerrit:836928{{!}}Revert "cirrus: Don't configure cloud clusters for private wikis"]]
* 10:26 jijiki: disable puppet on jobrunners for 523908
* 20:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4027.ulsfo.wmnet
* 08:37 ariel@deploy1001: Finished deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default (duration: 00m 04s)
* 20:57 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:37 ariel@deploy1001: Started deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:36 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:24 gehel: repooling wdqs2004 - [[phab:T228122|T228122]]
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:22 gehel: repooling wdqs2003 - [[phab:T228122|T228122]]
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:20 vgutierrez: restart pybal on lvs2003
* 20:52 brennen@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.3" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.gcoIZ0BTKW"' returned non-zero exit status 255. (duration: 00m 00s)
* 08:16 vgutierrez: restart pybal on lvs2006
* 20:52 brennen@deploy1002: Started scap: Backport for [[gerrit:836886{{!}}cirrus: Don't configure cloud clusters for private wikis]]
* 08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1109 into API (duration: 00m 54s)
* 20:49 robh@cumin2002: START - Cookbook sre.dns.netbox
* 07:57 moritzm: installing idp1001 [[phab:T228403|T228403]]
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:38 moritzm: rebooting tungsten for kernel update
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:46 brennen@deploy1002: Sync cancelled.
* 07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 20:45 brennen@deploy1002: brennen and trainbranchbot: Backport for [[gerrit:836922{{!}}Revert "Add Nepalese Wikipedia tagline"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:45 brennen@deploy1002: Started scap: Backport for [[gerrit:836922{{!}}Revert "Add Nepalese Wikipedia tagline"]]
* 07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 20:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
* 07:03 elukey: restart php-fpm on mw1330 - op-cache hit ratio low
* 20:42 brennen@deploy1002: Sync cancelled.
* 07:02 jynus: reloading dbproxy1004/9
* 20:41 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:836880{{!}}Add Nepalese Wikipedia tagline (T318737)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:01 elukey: depool wdqs2004 from all services (waiting for maintenance)
* 20:41 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic2080` to pick up new master-eligible status
* 06:32 legoktm@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: [[phab:T225199|T225199]] (duration: 00m 55s)
* 20:41 brennen@deploy1002: Started scap: Backport for [[gerrit:836880{{!}}Add Nepalese Wikipedia tagline (T318737)]]
* 06:30 legoktm@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: [[phab:T225199|T225199]] (duration: 00m 55s)
* 20:38 brennen@deploy1002: Finished scap: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]] (duration: 08m 03s)
* 06:15 elukey: clear opcache on mwdebug*
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:26 fsero: repool ms-fe2005 - [[phab:T228196|T228196]]
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2116 (duration: 00m 55s)
* 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:11 eileen: I think I didn't push the turn it on commit - tried again  process-control config revision is {{Gerrit|9f7eba2193}}
* 20:35 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
* 03:03 eileen: process-control config revision is {{Gerrit|7598dc1bf9}} (jobs reenabled)
* 20:35 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts cp4027.ulsfo.wmnet
* 01:52 XioNoX: enable outbound sampling on eqiad's router
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:52 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Add even more severe rate limits for eswikiquote and some other, smaller wikis ([[phab:T227416|T227416]]) (duration: 00m 58s)
* 20:33 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
* 00:38 mutante: mwmaint2001 - puppet fails - not removing a bunch of log dirs for maintenance crons
* 20:30 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 00:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
* 20:30 brennen@deploy1002: Started scap: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]]
* 00:08 eileen: process-control config revision is {{Gerrit|7598dc1bf9}} - jobs disabled
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:04 mutante: install1002 - exported indices for new scap version - copied back from buster to stretch - upgraded scap version on mw2250 - scap pull now works and starts to rsync ([[phab:T228482|T228482]], [[phab:T228328|T228328]], [[phab:T226948|T226948]])
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]] (duration: 08m 05s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:19 hoo: Ran foreachwikiindblist wikidataclient-test extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php
* 20:17 ejegg: payments-wiki upgraded from {{Gerrit|0456850e}} to {{Gerrit|839d6dde}} (with cache prefix altered for moved classes)
* 20:17 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic2086` to pick up new master-eligible status
* 20:17 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:17 brennen@deploy1002: Started scap: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]]
* 20:04 ejegg: payments-wiki rolled back from {{Gerrit|839d6dde}} to {{Gerrit|0456850e}}
* 19:56 ejegg: payments-wiki upgraded from {{Gerrit|0456850e}} to {{Gerrit|839d6dde}}
* 19:55 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic208[1,3]` to pick up new master-eligible status
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
* 19:33 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic207[3,4]` to pick up new master-eligible status
* 19:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 19:29 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 19:09 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4021.ulsfo.wmnet
* 19:09 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:04 robh@cumin2002: START - Cookbook sre.dns.netbox
* 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4021.ulsfo.wmnet
* 18:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:08 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:07 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:06 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35188 and previous config saved to /var/cache/conftool/dbconfig/20220929-162812-ladsgroup.json
* 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35187 and previous config saved to /var/cache/conftool/dbconfig/20220929-162750-ladsgroup.json
* 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35186 and previous config saved to /var/cache/conftool/dbconfig/20220929-161244-ladsgroup.json
* 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35185 and previous config saved to /var/cache/conftool/dbconfig/20220929-155737-ladsgroup.json
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:836858{{!}}Configure `mul` Wikibase language code on Beta wikis]] (beta-only, prod noop) (duration: 03m 41s)
* 15:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35184 and previous config saved to /var/cache/conftool/dbconfig/20220929-154231-ladsgroup.json
* 15:35 dancy@deploy1002: Installation of scap version "4.25.0" completed for 561 hosts
* 15:35 dancy@deploy1002: Installing scap version "4.25.0" for 561 hosts
* 14:30 moritzm: installing glib2.0 security updates
* 14:29 moritzm: uploaded glib2.0 2.50.3-2+deb9u3+wmf1  to apt.wikimedia.org/stretch-wikimedia
* 14:17 moritzm: rolling restart of apache2 in mw/eqiad to pick up Expat security updates
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11164
* 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11164
* 13:54 claime: Enabled puppet for C:memcache hosts following merge [[gerrit:835585{{!}}C:memcached Fix memcached bootstrap]]
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 32934
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35179 and previous config saved to /var/cache/conftool/dbconfig/20220929-134844-root.json
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:46 claime: Disabling puppet for C:memcache hosts to merge [[gerrit:835585{{!}}C:memcached Fix memcached bootstrap]]
* 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 13:41 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]] (duration: 06m 13s)
* 13:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8966
* 13:39 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8966
* 13:35 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and hoo: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:34 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]]
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35178 and previous config saved to /var/cache/conftool/dbconfig/20220929-133339-root.json
* 13:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]] (duration: 05m 36s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:28 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kemayo: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:27 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]]
* 13:26 moritzm: restartting Apache on lists
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]] (duration: 05m 23s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35176 and previous config saved to /var/cache/conftool/dbconfig/20220929-131834-root.json
* 13:15 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:15 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]]
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35175 and previous config saved to /var/cache/conftool/dbconfig/20220929-131507-root.json
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:11 moritzm: rolling restart of apache2 in mw/codfw to pick up Expat security updates
* 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835291{{!}}votewiki: Change wgLanguageCode to zh for Sep 2022 admins election (T318147)]] (duration: 03m 40s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35174 and previous config saved to /var/cache/conftool/dbconfig/20220929-130329-root.json
* 13:01 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 04m 04s)
* 13:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35173 and previous config saved to /var/cache/conftool/dbconfig/20220929-130003-root.json
* 12:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:57 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35172 and previous config saved to /var/cache/conftool/dbconfig/20220929-124824-root.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35171 and previous config saved to /var/cache/conftool/dbconfig/20220929-124458-root.json
* 12:44 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]] (duration: 09m 05s)
* 12:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:35 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 12:34 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]]
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35169 and previous config saved to /var/cache/conftool/dbconfig/20220929-123319-root.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35168 and previous config saved to /var/cache/conftool/dbconfig/20220929-122953-root.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35167 and previous config saved to /var/cache/conftool/dbconfig/20220929-121814-root.json
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35166 and previous config saved to /var/cache/conftool/dbconfig/20220929-121448-root.json
* 12:10 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 12:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3292
* 12:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3292
* 12:04 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35165 and previous config saved to /var/cache/conftool/dbconfig/20220929-120309-root.json
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35164 and previous config saved to /var/cache/conftool/dbconfig/20220929-115943-root.json
* 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 199524
* 11:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 199524
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P35163 and previous config saved to /var/cache/conftool/dbconfig/20220929-115612-root.json
* 11:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 209453
* 11:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 209453
* 11:51 ladsgroup@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15695
* 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15695
* 11:45 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 42
* 11:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
* 11:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3856
* 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35162 and previous config saved to /var/cache/conftool/dbconfig/20220929-114438-root.json
* 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35161 and previous config saved to /var/cache/conftool/dbconfig/20220929-114431-ladsgroup.json
* 11:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3856
* 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42
* 11:41 ladsgroup@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 11:40 ladsgroup@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 11:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42
* 11:39 ladsgroup@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 11:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62955
* 11:38 ladsgroup@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 11:38 ladsgroup@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 11:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62955
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35160 and previous config saved to /var/cache/conftool/dbconfig/20220929-112933-root.json
* 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35159 and previous config saved to /var/cache/conftool/dbconfig/20220929-112925-ladsgroup.json
* 11:16 XioNoX: re-pool cr2-eqord - [[phab:T295690|T295690]]
* 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P35158 and previous config saved to /var/cache/conftool/dbconfig/20220929-111418-ladsgroup.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 [[phab:T318892|T318892]]', diff saved to https://phabricator.wikimedia.org/P35157 and previous config saved to /var/cache/conftool/dbconfig/20220929-111217-root.json
* 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 codfw primary [[phab:T318892|T318892]]', diff saved to https://phabricator.wikimedia.org/P35156 and previous config saved to /var/cache/conftool/dbconfig/20220929-111127-root.json
* 11:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - [[phab:T318892|T318892]]
* 11:06 XioNoX: restart cr2-eqord for upgrade - [[phab:T295690|T295690]]
* 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
* 11:04 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
* 11:01 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35155 and previous config saved to /var/cache/conftool/dbconfig/20220929-105912-ladsgroup.json
* 10:53 XioNoX: drain cr2-eqord - [[phab:T295690|T295690]]
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 [[phab:T318892|T318892]]', diff saved to https://phabricator.wikimedia.org/P35154 and previous config saved to /var/cache/conftool/dbconfig/20220929-105206-root.json
* 10:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s8 [[phab:T318892|T318892]]
* 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s8 [[phab:T318892|T318892]]
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318892|T318892]]
* 10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
* 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318892|T318892]]
* 10:50 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-eqord,cr2-eqord IPv6 with reason: router upgrade
* 10:40 XioNoX: repool cr2-eqiad - [[phab:T295690|T295690]]
* 10:36 moritzm: installing poppler security updates
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35153 and previous config saved to /var/cache/conftool/dbconfig/20220929-100849-ladsgroup.json
* 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35152 and previous config saved to /var/cache/conftool/dbconfig/20220929-100828-ladsgroup.json
* 10:07 XioNoX: second (and longest) cr2-eqiad RE switchover - [[phab:T295690|T295690]]
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35150 and previous config saved to /var/cache/conftool/dbconfig/20220929-095321-ladsgroup.json
* 09:45 moritzm: restarting superset to pick up expat security update
* 09:43 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 09:42 XioNoX: first cr2-eqiad RE switchover - [[phab:T295690|T295690]]
* 09:41 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 09:38 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P35149 and previous config saved to /var/cache/conftool/dbconfig/20220929-093815-ladsgroup.json
* 09:36 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 09:34 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 09:33 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 09:33 XioNoX: drain cr2-eqiad - [[phab:T295690|T295690]]
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
* 09:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr2-eqiad,cr2-eqiad IPv6,re0.cr2-eqiad.mgmt with reason: router upgrade
* 09:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:26 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2098.codfw.wmnet with OS bullseye
* 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35148 and previous config saved to /var/cache/conftool/dbconfig/20220929-092308-ladsgroup.json
* 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:16 XioNoX: repool cr1-eqiad - [[phab:T295690|T295690]]
* 09:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.40.0-wmf.3"
* 09:07 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
* 09:04 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
* 08:52 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2098.codfw.wmnet with OS bullseye
* 08:43 XioNoX: second cr1-eqiad RE switchover - [[phab:T295690|T295690]]
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35146 and previous config saved to /var/cache/conftool/dbconfig/20220929-082757-root.json
* 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:15 XioNoX: first cr1-eqiad RE switchover (for NVM firmware) - [[phab:T295690|T295690]]
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35145 and previous config saved to /var/cache/conftool/dbconfig/20220929-081252-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35144 and previous config saved to /var/cache/conftool/dbconfig/20220929-080340-root.json
* 07:57 XioNoX: drain traffic away from cr1-eqiad - [[phab:T295690|T295690]]
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35143 and previous config saved to /var/cache/conftool/dbconfig/20220929-075747-root.json
* 07:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
* 07:49 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cr1-eqiad,cr1-eqiad IPv6,re0.cr1-eqiad.mgmt with reason: router upgrade
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35142 and previous config saved to /var/cache/conftool/dbconfig/20220929-074835-root.json
* 07:45 moritzm: installing expat security updates
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35141 and previous config saved to /var/cache/conftool/dbconfig/20220929-074242-root.json
* 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 18106
* 07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 18106
* 07:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38040
* 07:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38040
* 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35280
* 07:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35280
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35140 and previous config saved to /var/cache/conftool/dbconfig/20220929-073330-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35139 and previous config saved to /var/cache/conftool/dbconfig/20220929-072745-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35138 and previous config saved to /var/cache/conftool/dbconfig/20220929-072737-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35137 and previous config saved to /var/cache/conftool/dbconfig/20220929-071825-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35136 and previous config saved to /var/cache/conftool/dbconfig/20220929-071240-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35135 and previous config saved to /var/cache/conftool/dbconfig/20220929-071232-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35134 and previous config saved to /var/cache/conftool/dbconfig/20220929-070320-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35133 and previous config saved to /var/cache/conftool/dbconfig/20220929-065736-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35132 and previous config saved to /var/cache/conftool/dbconfig/20220929-065727-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35131 and previous config saved to /var/cache/conftool/dbconfig/20220929-064815-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35130 and previous config saved to /var/cache/conftool/dbconfig/20220929-064231-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35129 and previous config saved to /var/cache/conftool/dbconfig/20220929-064222-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P35128 and previous config saved to /var/cache/conftool/dbconfig/20220929-063508-root.json
* 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35127 and previous config saved to /var/cache/conftool/dbconfig/20220929-063310-root.json
* 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35126 and previous config saved to /var/cache/conftool/dbconfig/20220929-062726-root.json
* 06:27 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35125 and previous config saved to /var/cache/conftool/dbconfig/20220929-061805-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35124 and previous config saved to /var/cache/conftool/dbconfig/20220929-061221-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35123 and previous config saved to /var/cache/conftool/dbconfig/20220929-060532-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary and set section read-write [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35122 and previous config saved to /var/cache/conftool/dbconfig/20220929-060425-root.json
* 06:03 marostegui: Starting s7 codfw failover from db2121 to db2118 - [[phab:T318888|T318888]]
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35121 and previous config saved to /var/cache/conftool/dbconfig/20220929-055716-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2118 from API [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35120 and previous config saved to /var/cache/conftool/dbconfig/20220929-054542-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 [[phab:T318888|T318888]]', diff saved to https://phabricator.wikimedia.org/P35119 and previous config saved to /var/cache/conftool/dbconfig/20220929-054509-root.json
* 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318888|T318888]]
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T318888|T318888]]
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35118 and previous config saved to /var/cache/conftool/dbconfig/20220929-054211-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 from API [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35117 and previous config saved to /var/cache/conftool/dbconfig/20220929-053951-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35116 and previous config saved to /var/cache/conftool/dbconfig/20220929-053407-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary and set section read-write [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35115 and previous config saved to /var/cache/conftool/dbconfig/20220929-053302-root.json
* 05:32 marostegui: Starting s4 codfw failover from db2110 to db2140 - [[phab:T318886|T318886]]
* 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35114 and previous config saved to /var/cache/conftool/dbconfig/20220929-052805-ladsgroup.json
* 05:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 05:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35113 and previous config saved to /var/cache/conftool/dbconfig/20220929-052743-ladsgroup.json
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35112 and previous config saved to /var/cache/conftool/dbconfig/20220929-051237-ladsgroup.json
* 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T318886|T318886]]
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 [[phab:T318886|T318886]]', diff saved to https://phabricator.wikimedia.org/P35111 and previous config saved to /var/cache/conftool/dbconfig/20220929-051114-root.json
* 05:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T318886|T318886]]
* 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P35110 and previous config saved to /var/cache/conftool/dbconfig/20220929-045730-ladsgroup.json
* 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35109 and previous config saved to /var/cache/conftool/dbconfig/20220929-044224-ladsgroup.json
* 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35108 and previous config saved to /var/cache/conftool/dbconfig/20220929-035724-ladsgroup.json
* 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35107 and previous config saved to /var/cache/conftool/dbconfig/20220929-035647-ladsgroup.json
* 03:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35106 and previous config saved to /var/cache/conftool/dbconfig/20220929-034140-ladsgroup.json
* 03:40 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
* 03:40 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P35105 and previous config saved to /var/cache/conftool/dbconfig/20220929-032634-ladsgroup.json
* 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35104 and previous config saved to /var/cache/conftool/dbconfig/20220929-031127-ladsgroup.json
* 02:29 ejegg: updated fundraising CiviCRM from {{Gerrit|f3461a44}} to {{Gerrit|5e1738a1}}
* 02:20 ejegg: updated fundraising python tools from {{Gerrit|dd494413}} to {{Gerrit|14d60435}}
* 01:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2037.codfw.wmnet with OS buster
* 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
* 00:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage


== 2019-07-18 ==
== 2022-09-28 ==
* 23:50 mutante: built new scap version 3.11.1-1 on boron, copied to install1002, imported package with reprepro, copied from stretch to jessie and buster ([[phab:T228482|T228482]])
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS buster
* 23:22 Lucas_WMDE: Evening SWAT done
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2037']
* 23:17 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:523141{{!}}Configure Citoid+Wikibase integration on Beta (production no-op) (T228411)]] (duration: 00m 54s)
* 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2037']
* 23:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:523140{{!}}Set $wgWBRepoSettings[enableRefTabs] in Wikibase.php (T228414)]] (duration: 01m 16s)
* 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35103 and previous config saved to /var/cache/conftool/dbconfig/20220928-231719-ladsgroup.json
* 23:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:523139{{!}}Define settings for Citoid+Wikibase integration (T228414)]] (duration: 00m 55s)
* 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 22:23 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=wdqs1008.eqiad.wmnet
* 23:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 22:16 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:20 ejegg: updated fundraising CiviCRM from {{Gerrit|d31c19a0}} to {{Gerrit|f3461a44}}
* 22:00 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35102 and previous config saved to /var/cache/conftool/dbconfig/20220928-213701-ladsgroup.json
* 21:49 bd808: Cleaned up stale striker logs on labweb1001 and labweb1002. Logs go to journald now so log rotate is not triggered to rotate out logs from before that change.
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:42 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 bd808@deploy1001: Finished deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models ([[phab:T228222|T228222]], [[phab:T228332|T228332]]) (duration: 01m 13s)
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35101 and previous config saved to /var/cache/conftool/dbconfig/20220928-213640-ladsgroup.json
* 21:34 bd808@deploy1001: Started deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models ([[phab:T228222|T228222]], [[phab:T228332|T228332]])
* 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35100 and previous config saved to /var/cache/conftool/dbconfig/20220928-212131-ladsgroup.json
* 21:15 mutante: gerrit (cobalt) - scheduled 1h downtime, rebooting for kernel upgrade
* 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35099 and previous config saved to /var/cache/conftool/dbconfig/20220928-210624-ladsgroup.json
* 21:03 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: [[phab:T228290|T228290]] Fix fatal in ChangesListFormatter::getLogTextLinks() (duration: 01m 02s)
* 21:06 volans: installed spicerack 4.0.0-1+deb11u1 on cumin1001
* 20:57 mutante: gerrit2001 - icinga downtime for 1h
* 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:56 mutante: gerrit2001 - reboot for kernel upgrade
* 20:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:51 mutante: gerrit2001 - apt-get upgrade; apt-get autoremove ; puppet agent -tv
* 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35098 and previous config saved to /var/cache/conftool/dbconfig/20220928-205117-ladsgroup.json
* 19:55 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12200
* 19:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T228374|T228374]] Enable SecureLinkFixer in beta cluster (2/2) (duration: 00m 55s)
* 20:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12200
* 19:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T228374|T228374]] Enable SecureLinkFixer in beta cluster (1/2) (duration: 00m 55s)
* 20:39 TheresNoTime: closing UTC late backport window
* 19:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T207750|T207750]] Revoke editmyuserjsredirect from all users (duration: 00m 54s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:25 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:21 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:20 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:45 mutante: contint2001 - had puppet failure in puppet board / dpkg issue due to unfinished zuul install which was done on contint1001 - stopped zuul and zuul-merger, apt-install zuul (was already latest version but needed to finish configure step), apt-get autoremove to remove unused packages, ran puppet. dpkg and puppet happy again
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]] (duration: 06m 19s)
* 17:45 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/includes/libs/objectcache/RedisBagOStuff.php: {{Gerrit|69cd8b0f49e8caf8c7398ad76a1ce3d2da4f3e6b}} (duration: 00m 55s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:15 Krinkle: krinkle@depoy1001: Pull down https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/523844/ and  https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/524276/ (no-op, not deploying)
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:29 XioNoX: upgrade Routinator to 0.5.0 in eqiad - [[phab:T220669|T220669]]
* 20:18 samtar@deploy1002: samtar and essexigyan: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 16:24 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/resources/src/mediawiki.misc-authed-ooui/special.movePage.js: {{Gerrit|e97a284dbe54}} (duration: 00m 58s)
* 20:18 samtar@deploy1002: Started scap: Backport for [[gerrit:836244{{!}}[config]: Deploy GDI survey Wave 3 (T318156)]]
* 16:17 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:11 samtar@deploy1002: Sync cancelled.
* 16:06 XioNoX: upgrade Routinator to 0.5.0 in codfw - [[phab:T220669|T220669]]
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:05 XioNoX: add routinator 0.5.0 to APT
* 20:08 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 15:54 fsero: depool ms-fe2005 - [[phab:T228196|T228196]]
* 20:04 samtar@deploy1002: samtar and dani: Backport for [[gerrit:834042{{!}}Deploy Research Incentive survey on arwiki (T318328)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:40 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.34.0-wmf.13 # [[phab:T228436|T228436]] [[phab:T220739|T220739]]
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:834042{{!}}Deploy Research Incentive survey on arwiki (T318328)]]
* 15:19 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:24 ejegg: updated fundraising CiviCRM from {{Gerrit|916a8b08}} to {{Gerrit|d31c19a0}}
* 14:46 godog: roll-restart thumbor in codfw - [[phab:T228086|T228086]]
* 19:08 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 14:45 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 18:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:37 liw: all wikis at 1.34.0-wmf.14
* 18:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:36 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
* 18:22 volans: installed spicerack 4.0.0-1+deb11u1 on cumin2002
* 14:28 bblack: cp hosts: apt autoremove to clean up pkgs on the fleet
* 18:22 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3f23a1b]: (no justification provided) (duration: 00m 11s)
* 14:27 nuria@deploy1001: Finished deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery (duration: 00m 20s)
* 18:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@3f23a1b]: (no justification provided)
* 14:26 nuria@deploy1001: Started deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:24 godog: repool thumbor2003
* 18:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:20 godog: reboot thumbor2003
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:17 jijiki: Depool thumbor2003 for reboot
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 03m 38s)
* 14:12 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 18:07 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:53 moritzm: installing php5 security updates
* 18:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 13:50 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:36 jeh: rebooting labstore1005.eqiad.wmnet - [[phab:T224228|T224228]]
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:34 jbond42: remove mtail 3.0.0~rc24.1-1+wmf1 from stretch-wikimedia
* 17:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:30 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.14 (duration: 00m 53s)
* 17:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19653
* 13:29 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.14
* 17:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19653
* 13:24 jbond42: downgrade cp servers backl to 3.0.0~rc5-1~bpo9+1
* 17:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:23 liw: promoting 1.34.0-wmf.14 to group1
* 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1037.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:22 godog: temporarily stop ircecho on icinga1001 to avoid spam
* 17:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host logstash1036.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:00 jbond42: rolling upgrade of mtail
* 17:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32098
* 12:57 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32098
* 12:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:53 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4181
* 12:51 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4181
* 12:34 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 17:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:26 jbond42: add mtail 3.0.0~rc24.1-1+wmf1 to stretch-wikimedia
* 17:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 11:13 dcausse: EU Swat done
* 17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 11:08 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert [cirrus] switch search traffic (except completion) to codfw (duration: 00m 56s)
* 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35097 and previous config saved to /var/cache/conftool/dbconfig/20220928-171848-ladsgroup.json
* 11:02 godog: swift eqiad-prod: put back ms-be1043 sdk1 - [[phab:T218544|T218544]]
* 17:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1024.eqiad.wmnet with OS bullseye
* 10:51 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1024.eqiad.wmnet with OS bullseye
* 10:43 ema: cp-eqiad: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades [[phab:T227672|T227672]]
* 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35096 and previous config saved to /var/cache/conftool/dbconfig/20220928-170342-ladsgroup.json
* 10:37 jijiki: enable puppet on services_proxy hosts - [[phab:T228063|T228063]]
* 16:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
* 10:29 godog: reboot wezen.codfw.wmnet - [[phab:T225713|T225713]]
* 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 10:27 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 16:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
* 10:15 jijiki: Disable puppet on services_proxy hosts - [[phab:T228063|T228063]]
* 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P35095 and previous config saved to /var/cache/conftool/dbconfig/20220928-164835-ladsgroup.json
* 09:33 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:26 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
* 09:09 godog: resume swift ms-be rolling restarts - [[phab:T225713|T225713]]
* 16:36 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@f89d689]: (no justification provided) (duration: 00m 12s)
* 09:03 fsero: reuploding missing layers [[phab:T228196|T228196]]
* 16:36 nokafor@deploy1002: Started deploy [airflow-dags/analytics@f89d689]: (no justification provided)
* 08:57 hashar: contint1001: stopped zuul, ran apt install to get the new python2.7 copied to Zuul virtualenv, restarted zuul/zuul-merger. That clears a couple Icinga alarms from yesterday
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 08:56 marostegui: Drop afl_log_id column from enwiki.abuse_filter_log on db2116 [[phab:T226851|T226851]]
* 16:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
* 08:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2116 (duration: 00m 55s)
* 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:18 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:14 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35093 and previous config saved to /var/cache/conftool/dbconfig/20220928-163329-ladsgroup.json
* 06:56 dcausse: deleting zerowiki elastic indices (eqiad and codfw) [[phab:T227718|T227718]]
* 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:22 marostegui: Stop MySQL on db2045, host will be decommissioned [[phab:T228281|T228281]]
* 16:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 10310
* 05:18 marostegui: Remove db2045 from tendril and zarcillo [[phab:T228281|T228281]]
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 05:16 marostegui: Disable notifications on db2045 [[phab:T228281|T228281]]
* 16:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
* 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2045 from config, will be decommissioned [[phab:T228281|T228281]] (duration: 00m 54s)
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:08 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2045 from config, will be decommissioned [[phab:T228281|T228281]] (duration: 00m 56s)
* 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 04:31 legoktm: running query for [[phab:T227843|T227843]] on mwmaint102
* 16:26 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4775
* 16:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4775
* 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2635
* 16:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2635
* 16:15 volans: uploaded spicerack_4.0.0 to apt.wikimedia.org bullseye-wikimedia
* 15:57 dancy@deploy1002: Installation of scap version "4.24.0" completed for 561 hosts
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
* 15:57 dancy@deploy1002: Installing scap version "4.24.0" for 561 hosts
* 15:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40217
* 15:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40217
* 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
* 15:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
* 15:51 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0646be1]: (no justification provided) (duration: 00m 10s)
* 15:51 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0646be1]: (no justification provided)
* 15:47 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
* 15:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2036.codfw.wmnet with OS buster
* 15:26 moritzm: installing libgoogle-gson-java security updates on bullseye
* 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4922
* 15:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4922
* 15:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 714
* 15:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
* 15:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 714
* 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 19108
* 15:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 19108
* 15:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
* 15:09 moritzm: installing twisted security updates
* 15:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8674
* 15:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8674
* 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35092 and previous config saved to /var/cache/conftool/dbconfig/20220928-150230-ladsgroup.json
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35091 and previous config saved to /var/cache/conftool/dbconfig/20220928-150158-ladsgroup.json
* 15:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 15:00 SandraEbele: deploying Airflow for hdfsarchiver operator fix
* 15:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@aa7984f]: (no justification provided) (duration: 00m 14s)
* 15:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@aa7984f]: (no justification provided)
* 14:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite1005.eqiad.wmnet with OS bullseye
* 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.wikimedia.org
* 14:53 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
* 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 394354
* 14:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 394354
* 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 393950
* 14:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 393950
* 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262589
* 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 262589
* 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209453
* 14:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 209453
* 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
* 14:48 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
* 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 199524
* 14:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 65517
* 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 65517
* 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62955
* 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62955
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 57695
* 14:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 57695
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53334
* 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35090 and previous config saved to /var/cache/conftool/dbconfig/20220928-144651-ladsgroup.json
* 14:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 53334
* 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
* 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 52320
* 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46450
* 14:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
* 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
* 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46450
* 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
* 14:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40217
* 14:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
* 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 14:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
* 14:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
* 14:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36351
* 14:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35280
* 14:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite1005.eqiad.wmnet with reason: host reimage
* 14:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35280
* 14:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32934
* 14:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32934
* 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32787
* 14:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32787
* 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
* 14:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32098
* 14:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 29791
* 14:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 29791
* 14:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744
* 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26744
* 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
* 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
* 14:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22987
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P35089 and previous config saved to /var/cache/conftool/dbconfig/20220928-143145-ladsgroup.json
* 14:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22987
* 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22773
* 14:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22773
* 14:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22616
* 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22616
* 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21949
* 14:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 14:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host graphite1005.eqiad.wmnet with OS bullseye
* 14:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21949
* 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21928
* 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21928
* 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20115
* 14:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20115
* 14:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19653
* 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19653
* 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19151
* 14:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19151
* 14:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19108
* 14:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 14:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19108
* 14:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18106
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18106
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16735
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16735
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
* 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16276
* 14:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15695
* 14:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15695
* 14:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
* 14:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
* 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14630
* 14:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14630
* 14:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14361
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14361
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13760
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13760
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13489
* 14:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13489
* 14:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13335
* 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35088 and previous config saved to /var/cache/conftool/dbconfig/20220928-141638-ladsgroup.json
* 14:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13335
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12041
* 14:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12041
* 14:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11164
* 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11164
* 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11039
* 14:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11039
* 14:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
* 14:12 volans: added python3-gjson v0.0.5 to apt.w.o (bullseye only)
* 14:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10310
* 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
* 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
* 14:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
* 14:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8781
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35087 and previous config saved to /var/cache/conftool/dbconfig/20220928-141007-root.json
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35086 and previous config saved to /var/cache/conftool/dbconfig/20220928-141001-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35085 and previous config saved to /var/cache/conftool/dbconfig/20220928-140956-root.json
* 14:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8781
* 14:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35084 and previous config saved to /var/cache/conftool/dbconfig/20220928-140950-root.json
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
* 14:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
* 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
* 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8359
* 14:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudrabbit1003.wikimedia.org
* 14:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8359
* 14:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8075
* 14:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-eqiad
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8075
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7843
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7843
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7795
* 14:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7795
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7784
* 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7784
* 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7713
* 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7713
* 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7195
* 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
* 14:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7195
* 14:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6762
* 14:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host graphite1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6762
* 14:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6614
* 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6614
* 14:02 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
* 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6128
* 14:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6128
* 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6079
* 14:01 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
* 14:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6079
* 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5650
* 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5650
* 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5400
* 14:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5400
* 14:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4922
* 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4922
* 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4826
* 13:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4826
* 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4775
* 13:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4775
* 13:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
* 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4637
* 13:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4230
* 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4230
* 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4181
* 13:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4181
* 13:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35083 and previous config saved to /var/cache/conftool/dbconfig/20220928-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35082 and previous config saved to /var/cache/conftool/dbconfig/20220928-135456-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35081 and previous config saved to /var/cache/conftool/dbconfig/20220928-135451-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35080 and previous config saved to /var/cache/conftool/dbconfig/20220928-135445-root.json
* 13:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
* 13:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3300
* 13:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:52 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
* 13:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3300
* 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3292
* 13:50 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
* 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3292
* 13:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2906
* 13:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
* 13:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2906
* 13:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2647
* 13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2647
* 13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2635
* 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2635
* 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2603
* 13:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2603
* 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1273
* 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1273
* 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 812
* 13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 812
* 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 714
* 13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 714
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35079 and previous config saved to /var/cache/conftool/dbconfig/20220928-133957-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35078 and previous config saved to /var/cache/conftool/dbconfig/20220928-133951-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35077 and previous config saved to /var/cache/conftool/dbconfig/20220928-133946-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35076 and previous config saved to /var/cache/conftool/dbconfig/20220928-133940-root.json
* 13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=1) rolling restart_daemons on A:thanos-fe-codfw
* 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 577
* 13:32 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe-codfw
* 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 577
* 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
* 13:31 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
* 13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35075 and previous config saved to /var/cache/conftool/dbconfig/20220928-132452-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35074 and previous config saved to /var/cache/conftool/dbconfig/20220928-132446-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35073 and previous config saved to /var/cache/conftool/dbconfig/20220928-132442-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35072 and previous config saved to /var/cache/conftool/dbconfig/20220928-132435-root.json
* 13:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:15 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35071 and previous config saved to /var/cache/conftool/dbconfig/20220928-130947-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35070 and previous config saved to /var/cache/conftool/dbconfig/20220928-130941-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35069 and previous config saved to /var/cache/conftool/dbconfig/20220928-130937-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35068 and previous config saved to /var/cache/conftool/dbconfig/20220928-130930-root.json
* 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35067 and previous config saved to /var/cache/conftool/dbconfig/20220928-125442-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35066 and previous config saved to /var/cache/conftool/dbconfig/20220928-125436-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35065 and previous config saved to /var/cache/conftool/dbconfig/20220928-125432-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35064 and previous config saved to /var/cache/conftool/dbconfig/20220928-125425-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35063 and previous config saved to /var/cache/conftool/dbconfig/20220928-123937-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35062 and previous config saved to /var/cache/conftool/dbconfig/20220928-123932-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35061 and previous config saved to /var/cache/conftool/dbconfig/20220928-123927-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35060 and previous config saved to /var/cache/conftool/dbconfig/20220928-123920-root.json
* 12:34 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35058 and previous config saved to /var/cache/conftool/dbconfig/20220928-122432-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35057 and previous config saved to /var/cache/conftool/dbconfig/20220928-122427-root.json
* 12:24 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C thirdparty/elastic710 copy buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35056 and previous config saved to /var/cache/conftool/dbconfig/20220928-122422-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35055 and previous config saved to /var/cache/conftool/dbconfig/20220928-122421-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35054 and previous config saved to /var/cache/conftool/dbconfig/20220928-122415-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35053 and previous config saved to /var/cache/conftool/dbconfig/20220928-122414-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35052 and previous config saved to /var/cache/conftool/dbconfig/20220928-122411-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35051 and previous config saved to /var/cache/conftool/dbconfig/20220928-122403-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35050 and previous config saved to /var/cache/conftool/dbconfig/20220928-122356-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35049 and previous config saved to /var/cache/conftool/dbconfig/20220928-122350-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35048 and previous config saved to /var/cache/conftool/dbconfig/20220928-122346-root.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P35047 and previous config saved to /var/cache/conftool/dbconfig/20220928-122321-root.json
* 12:22 gehel: above reprepro copy failed, elastic710 component does not exist yet
* 12:21 XioNoX: re-enable Init7 in knams
* 12:21 gehel: copying wmf-elasticsearh-search-plugins from bullseye to buster (`reprepro -C elastic710 buster-wikimedia bullseye-wikimedia wmf-elasticsearch-search-plugins`)
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 db2146 db2122 es2022 for mariadb upgrade [[phab:T318128|T318128]]', diff saved to https://phabricator.wikimedia.org/P35046 and previous config saved to /var/cache/conftool/dbconfig/20220928-121912-root.json
* 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 12:09 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35045 and previous config saved to /var/cache/conftool/dbconfig/20220928-120916-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35044 and previous config saved to /var/cache/conftool/dbconfig/20220928-120909-root.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35043 and previous config saved to /var/cache/conftool/dbconfig/20220928-120906-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35042 and previous config saved to /var/cache/conftool/dbconfig/20220928-120858-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35041 and previous config saved to /var/cache/conftool/dbconfig/20220928-120852-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35040 and previous config saved to /var/cache/conftool/dbconfig/20220928-120845-root.json
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35039 and previous config saved to /var/cache/conftool/dbconfig/20220928-120841-root.json
* 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
* 11:58 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35038 and previous config saved to /var/cache/conftool/dbconfig/20220928-115411-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35037 and previous config saved to /var/cache/conftool/dbconfig/20220928-115404-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35036 and previous config saved to /var/cache/conftool/dbconfig/20220928-115401-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35035 and previous config saved to /var/cache/conftool/dbconfig/20220928-115354-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35034 and previous config saved to /var/cache/conftool/dbconfig/20220928-115347-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35033 and previous config saved to /var/cache/conftool/dbconfig/20220928-115340-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35032 and previous config saved to /var/cache/conftool/dbconfig/20220928-115336-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35031 and previous config saved to /var/cache/conftool/dbconfig/20220928-113906-root.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35030 and previous config saved to /var/cache/conftool/dbconfig/20220928-113900-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35029 and previous config saved to /var/cache/conftool/dbconfig/20220928-113856-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35028 and previous config saved to /var/cache/conftool/dbconfig/20220928-113849-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35027 and previous config saved to /var/cache/conftool/dbconfig/20220928-113842-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35026 and previous config saved to /var/cache/conftool/dbconfig/20220928-113835-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35025 and previous config saved to /var/cache/conftool/dbconfig/20220928-113831-root.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35024 and previous config saved to /var/cache/conftool/dbconfig/20220928-112401-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35023 and previous config saved to /var/cache/conftool/dbconfig/20220928-112355-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35022 and previous config saved to /var/cache/conftool/dbconfig/20220928-112351-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35021 and previous config saved to /var/cache/conftool/dbconfig/20220928-112344-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35020 and previous config saved to /var/cache/conftool/dbconfig/20220928-112337-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35019 and previous config saved to /var/cache/conftool/dbconfig/20220928-112330-root.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35018 and previous config saved to /var/cache/conftool/dbconfig/20220928-112326-root.json
* 11:18 moritzm: installing expat security updates
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35017 and previous config saved to /var/cache/conftool/dbconfig/20220928-110856-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35016 and previous config saved to /var/cache/conftool/dbconfig/20220928-110850-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35015 and previous config saved to /var/cache/conftool/dbconfig/20220928-110846-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35014 and previous config saved to /var/cache/conftool/dbconfig/20220928-110839-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35013 and previous config saved to /var/cache/conftool/dbconfig/20220928-110832-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35012 and previous config saved to /var/cache/conftool/dbconfig/20220928-110825-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35011 and previous config saved to /var/cache/conftool/dbconfig/20220928-110821-root.json
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35010 and previous config saved to /var/cache/conftool/dbconfig/20220928-105531-ladsgroup.json
* 10:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 10:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35009 and previous config saved to /var/cache/conftool/dbconfig/20220928-105520-ladsgroup.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35008 and previous config saved to /var/cache/conftool/dbconfig/20220928-105351-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35007 and previous config saved to /var/cache/conftool/dbconfig/20220928-105345-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35006 and previous config saved to /var/cache/conftool/dbconfig/20220928-105340-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35005 and previous config saved to /var/cache/conftool/dbconfig/20220928-105332-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35004 and previous config saved to /var/cache/conftool/dbconfig/20220928-105327-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35003 and previous config saved to /var/cache/conftool/dbconfig/20220928-105320-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35002 and previous config saved to /var/cache/conftool/dbconfig/20220928-105315-root.json
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P35001 and previous config saved to /var/cache/conftool/dbconfig/20220928-104014-ladsgroup.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35000 and previous config saved to /var/cache/conftool/dbconfig/20220928-103847-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34999 and previous config saved to /var/cache/conftool/dbconfig/20220928-103840-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34998 and previous config saved to /var/cache/conftool/dbconfig/20220928-103835-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34997 and previous config saved to /var/cache/conftool/dbconfig/20220928-103827-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34996 and previous config saved to /var/cache/conftool/dbconfig/20220928-103822-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34995 and previous config saved to /var/cache/conftool/dbconfig/20220928-103815-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34994 and previous config saved to /var/cache/conftool/dbconfig/20220928-103810-root.json
* 10:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 db1137 db1168 db1143 db1132 db1127 es1022 for mariadb upgrade [[phab:T318128|T318128]]', diff saved to https://phabricator.wikimedia.org/P34993 and previous config saved to /var/cache/conftool/dbconfig/20220928-102759-root.json
* 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P34992 and previous config saved to /var/cache/conftool/dbconfig/20220928-102508-ladsgroup.json
* 10:19 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 10:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 10:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34990 and previous config saved to /var/cache/conftool/dbconfig/20220928-101001-ladsgroup.json
* 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59689
* 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59689
* 08:49 jbond: disable puppet on cache serveres to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/832268
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34989 and previous config saved to /var/cache/conftool/dbconfig/20220928-084557-ladsgroup.json
* 08:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 08:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34988 and previous config saved to /var/cache/conftool/dbconfig/20220928-084535-ladsgroup.json
* 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34987 and previous config saved to /var/cache/conftool/dbconfig/20220928-083029-ladsgroup.json
* 08:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P34985 and previous config saved to /var/cache/conftool/dbconfig/20220928-081522-ladsgroup.json
* 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34984 and previous config saved to /var/cache/conftool/dbconfig/20220928-080015-ladsgroup.json
* 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:30 XioNoX: disable BGP to init7 in knams
* 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]] (duration: 05m 17s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:03 kartik@deploy1002: kartik and kartik: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 07:03 kartik@deploy1002: Started scap: Backport for [[gerrit:835606{{!}}testwiki: Enable Section Translation for Bambara and Goan Konkani Wikipedias (T314557)]]
* 06:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 06:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34981 and previous config saved to /var/cache/conftool/dbconfig/20220928-043052-ladsgroup.json
* 04:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34980 and previous config saved to /var/cache/conftool/dbconfig/20220928-043030-ladsgroup.json
* 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34979 and previous config saved to /var/cache/conftool/dbconfig/20220928-041524-ladsgroup.json
* 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P34978 and previous config saved to /var/cache/conftool/dbconfig/20220928-040017-ladsgroup.json
* 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34977 and previous config saved to /var/cache/conftool/dbconfig/20220928-034511-ladsgroup.json
* 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34976 and previous config saved to /var/cache/conftool/dbconfig/20220928-020746-ladsgroup.json
* 02:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34975 and previous config saved to /var/cache/conftool/dbconfig/20220928-020724-ladsgroup.json
* 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34974 and previous config saved to /var/cache/conftool/dbconfig/20220928-015218-ladsgroup.json
* 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P34973 and previous config saved to /var/cache/conftool/dbconfig/20220928-013711-ladsgroup.json
* 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34972 and previous config saved to /var/cache/conftool/dbconfig/20220928-012205-ladsgroup.json
* 01:18 ejegg: updated fundraising python tools from {{Gerrit|b65109af}} to {{Gerrit|dd494413}}
* 00:34 eileen: civicrm upgraded from {{Gerrit|118c1d0b}} to {{Gerrit|916a8b08}}
* 00:11 eileen: civicrm upgraded from {{Gerrit|e198fb4c}} to {{Gerrit|118c1d0b}}


== 2019-07-17 ==
== 2022-09-27 ==
* 23:51 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wmgUseTheWikipediaLibrary (false everywhere, no-op) (duration: 00m 54s)
* 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
* 23:48 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseTheWikipediaLibrary (false everywhere, no-op) (duration: 00m 53s)
* 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
* 22:35 mutante: reimaging mw2250 after disks have been replaced
* 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 22:16 hoo: Manually started the Wikidata RDF dumps on snapshot1008 (due to [[phab:T228104|T228104]])
* 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 21:42 apergos: started wikidata entity dumps json run on snapshot1008
* 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 21:37 nuria: deployment aborted for refinary 0.0.94
* 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 21:37 nuria@deploy1001: Finished deploy [analytics/refinery@4f07755]: refinery 0.0.94 (duration: 36m 28s)
* 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
* 21:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/loadbalancer: [[phab:T228104|T228104]] rdbms: better handle a non-existing  defaultGroup in LoadBalancer (duration: 00m 55s)
* 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
* 21:15 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: Clean up accidentally-deployed debugging code for [[phab:T228290|T228290]] (duration: 01m 02s)
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
* 21:10 otto@deploy1001: Finished deploy [eventstreams/deploy@dbc9bbb]: Fix ?doc to use openapi instead of swagger - [[phab:T227958|T227958]] (duration: 02m 52s)
* 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 21:07 otto@deploy1001: Started deploy [eventstreams/deploy@dbc9bbb]: Fix ?doc to use openapi instead of swagger - [[phab:T227958|T227958]]
* 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 21:00 nuria@deploy1001: Started deploy [analytics/refinery@4f07755]: refinery 0.0.94
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
* 20:35 accraze@deploy1001: Finished deploy [ores/deploy@676f7ba]: [[phab:T228331|T228331]] (duration: 24m 59s)
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:10 accraze@deploy1001: Started deploy [ores/deploy@676f7ba]: [[phab:T228331|T228331]]
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
* 19:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/libs/rdbms/loadbalancer: [[phab:T228104|T228104]] rdbms: better handle a non-existing  defaultGroup in LoadBalancer (duration: 00m 55s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2181.codfw.wmnet
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:36 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s eqiad
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:28 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s codfw
* 21:12 TheresNoTime: closing UTC late backport window
* 18:26 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s esams
* 21:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] (duration: 04m 53s)
* 18:25 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s ulsfo
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:23 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s eqsin
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:20 cdanis: cdanis@mw1261.eqiad.wmnet ~ % sudo -i pool
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:19 cdanis: testing conftool upgrade: cdanis@mw1261.eqiad.wmnet ~ % sudo -i depool
* 21:06 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:15 mutante: mw2181 - sudo: /usr/local/bin/mwscript: command not found  on scap pull ??
* 21:06 samtar@deploy1002: Started scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]]
* 18:14 mutante: mw2181 - scap pull ([[phab:T205240|T205240]])
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] (duration: 06m 58s)
* 18:06 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s mw-canary
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:02 cdanis: upgrade to python3-conftool 1.1.1-1 on mwdebug2001
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
* 18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include jessie-wikimedia conftool/conftool_1.1.1-1+deb8u1_amd64.changes
* 20:59 TheresNoTime: extending UTC late backport window
* 18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include buster-wikimedia conftool/conftool_1.1.1-1+deb10u1_amd64.changes
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include stretch-wikimedia conftool/conftool_1.1.1-1_amd64.changes
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:09 papaul: shutting down restbase2009 for firmware upgrade
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:06 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group[0{{!}}1] wikis to 1.34.0-wmf.13"
* 20:58 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 16:57 dcausse: morning swat done
* 20:58 samtar@deploy1002: Started scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]]
* 16:54 dcausse@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CirrusSearch/includes/ElasticaErrorHandler.php: [[phab:T228283|T228283]]: Log response data JSON on errors (duration: 00m 55s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:48 Urbanecm: Deployed patch for [[phab:T207094|T207094]]
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:47 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:40 elukey: execute reprepro clearvanished on install1002 to clear buster-wikimedia{{!}}thirdparty/amd-rocm (not used anymore)
* 20:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] (duration: 05m 29s)
* 16:37 dcausse: reponing morning SWAT
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:24 papaul: shutting down mw2181 for firmware upgrade
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 20:48 samtar@deploy1002: samtar and stang: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 16:19 jijiki: Depool mw2181 - [[phab:T205240|T205240]]
* 20:48 samtar@deploy1002: Started scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]]
* 16:08 Urbanecm: Morning SWAT done
* 20:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] (duration: 05m 14s)
* 16:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Raise zh_classicalwiki requirement for autoconfirmed ([[phab:T228141|T228141]]) (duration: 00m 55s)
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:07 cmjohnson1: powering off cloudvirt1014 for rack move [[phab:T226188|T226188]]
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:523686{{!}}Enable partial blocks on dewiki]] ([[phab:T228150|T228150]]) (duration: 00m 54s)
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
* 16:01 jbond42: copy confd package from stretch-wikimedia to buster-wikimedia
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:47 Urbanecm: Re-syncing patch for [[phab:T207094|T207094]] [[phab:T228284|T228284]] and wmf.14
* 20:41 samtar@deploy1002: samtar and ryankemper: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 15:37 Urbanecm: Deployed patch for [[phab:T207094|T207094]] [[phab:T228284|T228284]] to wmf.13 and wmf.14
* 20:41 samtar@deploy1002: Started scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]]
* 15:15 fsero: restarting swift-container-sync on ms-be* for getting logging configuration [[phab:T228196|T228196]]
* 20:38 samtar@deploy1002: Finished scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] (duration: 06m 02s)
* 15:11 papaul: shutting down mw2250 for disk replacement
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:10 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 hashar: upgrading CI Jenkins # [[phab:T228142|T228142]]
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:06 papaul: shutting down ms-be2022 for HW  troubleshooting
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 20:33 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 15:03 jijiki: Depool mw2269 to reboot it - [[phab:T227548|T227548]]
* 20:32 samtar@deploy1002: Started scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]]
* 15:00 godog: poweroff ms-be2022 - [[phab:T227667|T227667]]
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:55 moritzm: updated jenkins in thirdparty/ci (stretch) and thirdparty (jessie) to 2.176.2 ([[phab:T228142|T228142]])
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:45 fsero: enabling container-sync logging [[phab:T228196|T228196]]
* 20:24 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 14:41 otto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:41 otto@cumin1001: START - Cookbook sre.hosts.decommission
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:35 moritzm: restart pybal on lvs2002 (codfw primary) [[phab:T227778|T227778]]
* 20:22 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 14:32 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] (duration: 04m 58s)
* 14:31 gehel: repool maps1004 - [[phab:T218097|T218097]]
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.14 (duration: 00m 54s)
* 20:15 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:10 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.14
* 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 14:09 moritzm: restarting pybal on backup LVSes in codfw
* 20:15 samtar@deploy1002: Started scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]]
* 14:02 liw@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CirrusSearch/includes/Searcher.php: Do not serialize ResultsType instance [[phab:T228276|T228276]] (duration: 00m 55s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:37 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:26 moritzm: disabled puppet on Icinga hosts in preparation of adding the LDAP replicas/codfw to LVS
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 ema: cp-codfw: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades [[phab:T227672|T227672]]
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:06 ema: prometheus servers: remove varnish-upload_$dc_backend.yaml, replaced by ATS equivalent [[phab:T227668|T227668]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] (duration: 05m 46s)
* 12:57 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:36 godog: upgrade hp raid firmware on ms-be1 hosts - [[phab:T141756|T141756]]
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:15 Urbanecm: Running foreachwiki extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php in tmux session on mwmaint1002 ([[phab:T209565|T209565]])
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:11 Urbanecm: Ran extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php for cawiki and viwiki ([[phab:T209565|T209565]])
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:04 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 11:30 mlitn@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/WikibaseMediaInfo: [WikibaseMediaInfo] Revert "Add Wikidata links to statement UI elements" (duration: 00m 56s)
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]]
* 11:16 dcausse: reindexing wikidata (elastic@eqiad) [[phab:T227136|T227136]]
* 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 11:08 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T227136|T227136]]: [cirrus] switch search traffic (except completion) to codfw (duration: 00m 54s)
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 10:53 moritzm: re-enabled icinga1001 in meta monitoring
* 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
* 10:41 godog: install updated linux-image-4.9.0-9-amd64 on ms-be hosts
* 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 10:30 godog: start rolling reboot of ms-be eqiad hosts - [[phab:T225713|T225713]]
* 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 10:30 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 10:23 moritzm: rebooting icinga1001 for kernel update
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:20 moritzm: disabled icinga1001 in meta monitoring
* 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:08 moritzm: rebooting lithium for kernel update
* 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 10:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:02 brennen: 1.40.0-wmf.3 ([[phab:T314192|T314192]]) no current blockers, promoting to group0
* 10:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
* 09:33 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
* 09:33 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 09:23 moritzm: rebooting grafana1001 to pick up MDS-enabled qemu
* 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 09:21 ema: cp-ats: upgrade fifo-log-demux to 0.3 [[phab:T227668|T227668]]
* 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 09:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool and clarify db2045 status [[phab:T227862|T227862]] (duration: 00m 55s)
* 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 09:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 09:15 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
* 09:07 ema: upload fifo-log-demux 0.3 to stretch-wikimedia [[phab:T227668|T227668]]
* 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
* 08:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 08:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 08:36 jijiki: Disable puppet on thumbor* in eqiad, depool and pool back to apply 523728 - [[phab:T224572|T224572]]
* 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 08:17 jijiki: Pool mw1239 - [[phab:T227867|T227867]]
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 07:48 godog: swift eqiad-prod: put back ms-be1043 sdk1 - [[phab:T218544|T218544]]
* 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 07:46 ema: cp-esams: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades [[phab:T227672|T227672]]
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 07:33 moritzm: reimaging sarin for some tests
* 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
* 06:59 elukey: apply mcrouter async replication to mw2224 - [[phab:T225642|T225642]]
* 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
* 06:25 elukey: reboot analytics1072 as attempt to clear the megacli's config (and add a new disk)
* 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 06:20 elukey: sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to reset opcache
* 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 05:26 marostegui: Stop MySQL on db1065 for decommissioning - [[phab:T227560|T227560]]
* 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
* 05:24 marostegui: Remove db1065 from tendril and zarcillo - [[phab:T227560|T227560]]
* 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
* 03:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: [[phab:T227772|T227772]] (duration: 00m 54s)
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
* 03:42 tstarling@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: [[phab:T227772|T227772]] (duration: 00m 56s)
* 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 03:00 tstarling@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 54s)
* 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 02:58 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 57s)
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
* 00:50 mutante: wikitech-static commented out cert renewal cron job out of caution - still needs fixing but continue tomorrow
* 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 00:12 mutante: wikitech-static - adding (undocumented!) option webroot-map to certbot config to use webroot authenticator with different document roots per domain while using the config file and not cli params ([[phab:T214640|T214640]])
* 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]], 710183 rows done
* 00:01 mutante: wikitech-static certbot --dry-run renew ([[phab:T214640|T214640]])
* 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
* 00:01 mutante: wikitech-static changing certbot renewalparams: authenticator = webroot (changed from standalone), install = apache (unchanged) ([[phab:T214640|T214640]])
* 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
* 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]]
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 taavi@deploy1002: Finished scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] (duration: 06m 59s)
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 taavi@deploy1002: taavi and migr: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:59 taavi@deploy1002: Started scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]]
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
* 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 jbond: upload new wmf-laptop_0.5.4 package
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
* 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update [[phab:T311686|T311686]]
* 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
* 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
* 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
* 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
* 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
* 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - [[phab:T310745|T310745]]
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
* 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - [[phab:T310745|T310745]]
* 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
* 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
* 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
* 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
* 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
* 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
* 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
* 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
* 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 [[phab:T318128|T318128]]
* 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 36m 01s)
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
* 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2019-07-16 ==
== 2022-09-26 ==
* 23:53 RoanKattouw: Deployed patch for [[phab:T207094|T207094]]
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 23:27 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/skins/MinervaNeue/: Do not load main menu icons in critical path ([[phab:T227929|T227929]]) (duration: 00m 55s)
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 23:26 catrope@deploy1001: Synchronized php-1.34.0-wmf.13/skins/MinervaNeue/: Do not load main menu icons in critical path ([[phab:T227929|T227929]]) (duration: 00m 56s)
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 23:26 mutante: wikitech-static - current status with method 'standalone' is that it's broken on cert renewal and gets fixed by restarting apache, which makes no sense since the previous fixes were the straight opposite and the ticket claims the fix was moving back from apache to standalone ([[phab:T214640|T214640]])
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 23:26 fsero: repool ms-fe2005 [[phab:T228196|T228196]]
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 23:23 mutante: wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me -> [[phab:T204840|T204840]]#5243222 i previously did the opposite change in [[phab:T214640|T214640]]#4907685 to fix it) and that takes down apache during the renewal ([[phab:T214640|T214640]])
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 23:20 mutante: wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me) and that takes down apache during the renewal
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 23:17 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/GrowthExperiments/: Don't use timestamp in help panel questions in Flow ([[phab:T212433|T212433]]) (duration: 00m 56s)
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 23:09 mutante: wikitech-static got ssl config files in sync with the repo, the difference was really just that space on one line each though ([[phab:T225258|T225258]])
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 22:35 fsero: uploading only blobs on docker-registry-codfw from a backup on ms-fe2005 [[phab:T228196|T228196]]
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 22:29 mutante: wikitech-static the diff between the ssl config files in the repo and on server were just a space at the end of the ServerAdmin line .... [[phab:T225258|T225258]]
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 22:28 fsero: depooling ms-fe2005 for swift upload for registry [[phab:T228196|T228196]]
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 22:26 mutante: wikitech-static ran certbot with --dry-run renew to confirm cert renewal works and it was just fine .. 2 minutes later apache errors which were fixed by restarting apache2 ([[phab:T214640|T214640]])
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 22:24 mutante: wikitech-static restarted apache
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 22:11 mutante: wikitech-static: turn /etc/apache2/sites-available/wikitech-static.wikimedia.org-ssl.conf and status.wikimedia.org-ssl.conf into symlinks to /wikitech-static/apache/ to match config for http vhosts ([[phab:T225258|T225258]])
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:06 mutante: wikitech-static: move /etc/apache2/sites-available/000-default.conf and default-ssl.conf out of directory and reload apache to confirm they are not used and get us in sync with the repo contents again ([[phab:T225258|T225258]])
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:17 bd808@deploy1001: Finished deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade ([[phab:T221657|T221657]], [[phab:T227508|T227508]]) (duration: 01m 08s)
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:15 bd808@deploy1001: Started deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade ([[phab:T221657|T221657]], [[phab:T227508|T227508]])
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:55 SMalyshev: repooled wdqs2004 and wdqs2001 - reload done
* 20:31 TheresNoTime: closing UTC late backport window
* 20:26 mutante: ganeti1001 - gnt-instance remove netmon1003.wikimedia.org ([[phab:T220355|T220355]])
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 19:59 XioNoX: update ACLs on pfw3-eqiad/codfw - [[phab:T228205|T228205]]
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:52 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:51 fsero: republishing base images for wikimedia-(stretch,jessie and buster) [[phab:T228196|T228196]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:58 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 18:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 18:54 gehel: data copy from wdqs2004 to wdqs2001 - [[phab:T228122|T228122]]
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 18:47 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: retry - Produce revision-create stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 54s)
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 18:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce revision-create stream to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 54s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Update ExtensionDistributor config to point to REL1_33 as the released version (duration: 00m 54s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:05 fsero: republishing base images for nodejs-slim due to registry [[phab:T228196|T228196]]
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:02 andrewbogott: rebooting cloudcontrol2003-dev, cloudweb2001-dev, cloudcontrol1004 for [[phab:T225713|T225713]]
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:39 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce centralnotice.campaign-* streams to eventgate-main - [[phab:T211248|T211248]] (duration: 00m 55s)
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 17:23 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to {{Gerrit|334a4c4}} ([[phab:T227907|T227907]]) (duration: 04m 51s)
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 17:19 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to {{Gerrit|334a4c4}} ([[phab:T227907|T227907]])
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 16:55 mutante: netmon1003: shutdown -h now {{!}} ganeti1001: gnt-instance shutdown netmon1003.wikmedia.org - removed from icinga  [[phab:T198939|T198939]] [[phab:T220355|T220355]]
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 16:36 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling  jobs to PHP7 - [[phab:T219150|T219150]] (duration: 00m 50s)
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 16:35 jiji@deploy1001: Started deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling  jobs to PHP7 - [[phab:T219150|T219150]]
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 16:28 dcausse: reindexing wikidata (elastic@eqiad) [[phab:T227136|T227136]]
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 15:57 tarrow@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 15:37 elukey: reboot analytics1072 as attempt to force the raid controller to set a drive failed - [[phab:T226467|T226467]]
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 15:12 elukey: start mariadb on db1107 and re-enable mysql consumers on eventlog1002 and replication on db1108
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 14:53 elukey: stop mariadb on db1107 to allow maintenance
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:53 elukey: stop eventlogging mysql consumers on eventlog1002 and eventlogging_sync on db1108 to allow db1107 maintenance
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 14:52 jbond42: will restart redis on oresdb at 16:00 UTC - [[phab:T228045|T228045]]
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 14:51 jbond42: enable puppet accross the fleat
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 14:50 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime