You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ejegg: updated fundraising CiviCRM from 6d90d0cf06 to 12c5727a23)
imported>Stashbot
(sukhe: disable puppet on dns4003 till we resolve the puppet failures)
(992 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2019-10-02 ==
== 2022-10-05 ==
* 00:54 ejegg: updated fundraising CiviCRM from {{Gerrit|6d90d0cf06}} to {{Gerrit|12c5727a23}}
* 00:05 sukhe: disable puppet on dns4003 till we resolve the puppet failures
* 00:34 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/resources/src: {{Gerrit|5eb3ae1e888e353}} (duration: 01m 00s)
* 00:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: {{Gerrit|d30064229f9}} (duration: 00m 59s)


== 2019-10-01 ==
== 2022-10-04 ==
* 23:46 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: [[phab:T233127|T233127]]: ApiVisualEditor: Add logging for RESTBase HTTP errors (duration: 00m 58s)
* 23:09 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:44 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: [[phab:T233211|T233211]]: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
* 22:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:43 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: [[phab:T233211|T233211]]: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
* 21:28 cjming: end of UTC late backport window
* 23:28 mutante: cobalt (gerrit) rsyncing /srv/gerrit/plugins dir, push to new server gerrit1001 ([[phab:T222391|T222391]])
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:21 mutante: gerrit1001 - chown -R gerrit2:gerrit2 /srv/gerrit/git/ ([[phab:T222391|T222391]])
* 21:25 cjming@deploy1002: Finished scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] (duration: 05m 06s)
* 23:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T233211|T233211]]: CirrusSearch: Configuration for glent m0 AB test (duration: 00m 58s)
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:12 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T233127|T233127]]: Add VisualEditor logging channel to wmgMonologChannels (duration: 00m 59s)
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:30 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:19 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
* 21:21 cjming@deploy1002: cjming and cjming: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 21:34 godog: swift codfw-prod: add ms-be2051 with minimal weight - [[phab:T233638|T233638]] [[phab:T222366|T222366]]
* 21:20 cjming@deploy1002: Started scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]]
* 21:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: {{Gerrit|bb2fd9cf9c22cc}} (duration: 01m 00s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:29 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
* 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:29 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:11 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
* 21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
* 21:07 cjming@deploy1002: Finished scap: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]] (duration: 05m 40s)
* 19:58 mutante: cobalt (gerrit) - rsyncing gerrit data to gerrit1001 in a screen session ([[phab:T222391|T222391]])
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:47 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:47 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:42 marxarelli: 1.34.0-wmf.25 promoted to group0 cc: [[phab:T220750|T220750]]. no rise in relevant error rates
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.25
* 21:01 cjming@deploy1002: cjming and mdsshakil: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 19:30 marxarelli: promoting 1.34.0-wmf.25 to group0
* 21:01 cjming@deploy1002: Started scap: Backport for [[gerrit:838101{{!}}Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)]]
* 19:28 dduvall@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache (duration: 19m 31s)
* 20:59 cjming@deploy1002: Finished scap: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]] (duration: 06m 35s)
* 19:08 dduvall@deploy1001: Started scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:07 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.23 (duration: 01m 32s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:04 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.22 (duration: 01m 41s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:02 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.21 (duration: 01m 57s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:01 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
* 20:53 cjming@deploy1002: cjming and trainbranchbot: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 19:00 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
* 20:52 cjming@deploy1002: Started scap: Backport for [[gerrit:838264{{!}}Revert "Add wordmark and tagline for Bengali Wikibooks"]]
* 18:59 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.20 (duration: 02m 11s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:57 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.19 (duration: 02m 12s)
* 20:49 cjming@deploy1002: Sync cancelled.
* 18:54 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.17 (duration: 02m 48s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:48 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.16 (duration: 18m 45s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:53 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:52 thcipriani: gerrit restart for new config changes incoming
* 20:42 cjming@deploy1002: cjming and aishik: Backport for [[gerrit:838207{{!}}Add wordmark and tagline for Bengali Wikibooks (T319320)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 17:52 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
* 20:41 cjming@deploy1002: Started scap: Backport for [[gerrit:838207{{!}}Add wordmark and tagline for Bengali Wikibooks (T319320)]]
* 17:50 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
* 20:39 cjming@deploy1002: Finished scap: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]] (duration: 14m 29s)
* 17:48 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:48 XioNoX: rotate PDUs passwords - [[phab:T233053|T233053]]
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:14 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:09 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T156095|T156095]] - {{Gerrit|c28baa1862401}} (duration: 00m 59s)
* 20:25 cjming@deploy1002: cjming and d3r1ck01: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 17:07 mutante: Welcome new deployer Andrew Kostka (WMDE) ([[phab:T233202|T233202]])
* 20:25 cjming@deploy1002: Started scap: Backport for [[gerrit:838104{{!}}ParsoidHandler: use metrics from SiteConfig]]
* 17:07 marxarelli: cutting wmf/1.34.0-wmf.25
* 19:54 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS buster
* 16:16 _joe_: manually downgrading php-geoip on deploy*, it was still at the 7.0-only version from the distro
* 18:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
* 18:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
* 18:34 mutante: gerrit - deploying puppet refactoring change
* 16:10 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
* 18:34 tzatziki: removing 1 file for legal compliance
* 16:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS buster
* 15:36 _joe_: uninstalling temporarily the math rendering related packages from mwdebug2002, test for [[phab:T195847|T195847]]
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:36 elukey: powercycle an-conf1001 to test some bios settings
* 18:24 tzatziki: removing 1 file for legal compliance
* 15:12 jbond42: puppetmaster2001 is back online
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:34 dcausse: created cirrussearch indices for nqowiki ([[phab:T234326|T234326]])
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:18 moritzm: rebooting krb1001 for some tests
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:21 moritzm: installing gdk-pixbuf security updates
* 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 18:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 14:10 hashar: Restarting CI Jenkins
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:08 cdanis: ✔️ cdanis@puppetmaster2001.codfw.wmnet ~ 🕙☕ (cd /var/lib/git/labs/private ; git rev-parse HEAD {{!}} sudo tee /srv/config-master/labsprivate-sha1.txt )
* 18:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:08 cdanis: ✔️ cdanis@puppetmaster2001.codfw.wmnet ~ 🕙☕ (cd /var/lib/git/operations/puppet ; git rev-parse HEAD {{!}} sudo tee /srv/config-master/puppet-sha1.txt )
* 18:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:08 herron: beginning rolling reboots of eqiad and codfw logstash collectors
* 17:59 ejegg: turned fundraising scheduled jobs back on
* 14:02 moritzm: rebooting mw1265 for some tests
* 17:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]] (duration: 06m 58s)
* 14:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:55 moritzm: installing libsndfile security updates
* 13:59 cdanis: ✔️ cdanis@puppetmaster2001.codfw.wmnet ~ 🕙☕ sudo touch /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt && sudo chown gitpuppet:gitpuppet /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt
* 17:50 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 13:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:50 urbanecm@deploy1002: Started scap: Backport for [[gerrit:838105{{!}}Mentee table: fix wrong less import (T319321)]]
* 13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:49 ejegg: turned off fundraising scheduled jobs for civi deploy
* 13:24 jbond42: reimage puppetmaster2001
* 17:28 tzatziki: removing 4 files for legal compliance
* 12:37 hashar: Gerrit misbehaved temporarily due to human operator error (hashar ran jstack -l -m  which bring the jvm to an halt)
* 17:04 mutante: gerrit - deployed 832345 - scap and daemon users became decoupled ([[phab:T317412|T317412]])
* 11:16 jbond42: update puppet.ulsfo.wmnet to point to puppetmaster1001
* 17:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:45 jbond42: update puppet.esqin.wmnet to point to puppetmaster1001
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:17 moritzm: upgrading ferm on remaining mw servers 2.4.2pre [[phab:T153468|T153468]]
* 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:35 moritzm: run systemctl reset-failed on puppetmaster2002 to clear failed puppet-master.service
* 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:19 moritzm: upgrading ferm on a number of systems to 2.4.2pre [[phab:T153468|T153468]]
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:07 vgutierrez: restarting acme-chief on acmechief1001 to catch up with python3-cryptography upgrades - [[phab:T234131|T234131]]
* 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:04 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acme-chief hosts - [[phab:T234131|T234131]]
* 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:03 moritzm: rebalancing ganeti/row_B after rolling reboot
* 16:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:57 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acmechief-test1001 - [[phab:T234131|T234131]]
* 16:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 08:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:25 brennen@deploy1002: Pruned MediaWiki: 1.40.0-wmf.2 (duration: 02m 02s)
* 08:00 moritzm: draining ganeti2003 for upcoming reboot (combined kernel/qemu security updates)
* 16:24 brennen@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]] (duration: 28m 55s)
* 07:00 hashar: gerrit: forcing reindex of changes # [[phab:T233989|T233989]]
* 16:21 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns4003.wikimedia.org with OS bullseye
* 06:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 16:03 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 06:29 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 16:00 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 06:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:28 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3314 schema change - [[phab:T233625|T233625]]', diff saved to https://phabricator.wikimedia.org/P9223 and previous config saved to /var/cache/conftool/dbconfig/20191001-061956-marostegui.json
* 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 15:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS buster
* 00:12 mutante: phabricator - upgrading PHP version to 7.2.22 - [[phab:T230024|T230024]]
* 15:54 brennen@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 15:53 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 15:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 15:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 15:51 brennen: restarting `/usr/bin/scap stage-train --yes auto` after failed staging ([[phab:T314193|T314193]]), cc: ^demon
* 15:48 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 15:47 sukhe: disable Puppet on A:cp and A:eqiad for [[phab:T309651|T309651]]
* 15:42 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 15:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
* 15:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
* 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS buster
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
* 15:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
* 15:10 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2002.codfw.wmnet with OS buster
* 15:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 15:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 15:06 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:02 moritzm: installing snakeyaml security updates
* 14:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:55 papaul: maintenance complete on msw1-codfw
* 14:51 sukhe: disable Puppet on A:cp and A:esams for [[phab:T309651|T309651]]
* 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
* 14:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
* 14:40 moritzm: installing maven-shared-utils security updates
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2002.codfw.wmnet with OS buster
* 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
* 14:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
* 14:30 papaul: on going maintenance on msw1-codfw
* 14:29 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 14:27 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 14:22 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 14:14 XioNoX: netbox - Move VRRP IPs to FHRP group feature - [[phab:T311218|T311218]]
* 14:13 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 14:12 filippo@cumin1001: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 14:12 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/tests/phpunit/: Backport: [[gerrit:838094{{!}}Revert "Introduce LanguageVariantConverter" (T319282)]] (2/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 52s)
* 14:12 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/includes/: Backport: [[gerrit:838094{{!}}Revert "Introduce LanguageVariantConverter" (T319282)]] (1/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 43s)
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/extensions/Kartographer/modules/dialog: Backport: [[gerrit:838097{{!}}Log basic nearby and fullscreen events (T315972, T318678)]] (no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 42s)
* 14:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:55 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 13:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
* 13:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
* 13:49 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35347 and previous config saved to /var/cache/conftool/dbconfig/20221004-134947-root.json
* 13:49 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 13:48 sukhe: disable Puppet on A:cp and A:eqsin for [[phab:T309651|T309651]]
* 13:47 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:42 awight: EU backport window finished.
* 13:40 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 13:38 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
* 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:36 awight@deploy1002: Finished scap: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]] (duration: 06m 49s)
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
* 13:35 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "filippo test - filippo@cumin1001"
* 13:34 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "filippo test - filippo@cumin1001"
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35346 and previous config saved to /var/cache/conftool/dbconfig/20221004-133442-root.json
* 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
* 13:31 jbond: re-enable puppet post deploy a puppetmaster change 838144
* 13:30 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
* 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
* 13:30 awight@deploy1002: awight and awight: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:29 awight@deploy1002: Started scap: Backport for [[gerrit:836804{{!}}Wire new event stream for maps interactions (T315972 T318678)]]
* 13:28 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
* 13:27 awight@deploy1002: Finished scap: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]] (duration: 05m 16s)
* 13:24 jbond: disable puppet to deploy a puppetmaster change 838144
* 13:22 awight@deploy1002: awight and stang: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:21 awight@deploy1002: Started scap: Backport for [[gerrit:837757{{!}}ukwiki: Create flood group (T319243)]]
* 13:21 awight@deploy1002: Finished scap: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]] (duration: 12m 48s)
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35345 and previous config saved to /var/cache/conftool/dbconfig/20221004-131937-root.json
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 awight@deploy1002: awight and stang: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:08 awight@deploy1002: Started scap: Backport for [[gerrit:837756{{!}}throttle: Add throttle rule for 2022-10-13 (T319244)]]
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35343 and previous config saved to /var/cache/conftool/dbconfig/20221004-130432-root.json
* 12:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35342 and previous config saved to /var/cache/conftool/dbconfig/20221004-124927-root.json
* 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35341 and previous config saved to /var/cache/conftool/dbconfig/20221004-123422-root.json
* 12:31 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]] (duration: 00m 58s)
* 12:30 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]]
* 12:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 12:26 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]] (duration: 00m 14s)
* 12:26 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # [[phab:T310458|T310458]]
* 12:21 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35340 and previous config saved to /var/cache/conftool/dbconfig/20221004-121917-root.json
* 12:14 volans: uploaded python3-gjson_0.1.0 to apt.wikimedia.org bullseye-wikimedia
* 12:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 12:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 12:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
* 12:08 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host sessionstore2001.codfw.wmnet with OS buster
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35339 and previous config saved to /var/cache/conftool/dbconfig/20221004-120413-root.json
* 11:55 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 11:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
* 11:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
* 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 11:22 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 11:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 11:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 11:05 jayme: published calico 3.23.3 debian packages in bullseye component/calico323 as well as corresponding docker images - [[phab:T307943|T307943]]
* 11:04 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:58 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
* 10:55 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 10:54 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS buster
* 10:53 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 10:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 135158
* 10:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 135158
* 10:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9119
* 10:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9119
* 10:41 moritzm: installing expat security updates
* 09:59 jmm@cumin2002: END (FAIL) - Cookbook sre.maps.roll-restart (exit_code=1) rolling restart_daemons on A:maps-codfw
* 09:47 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:46 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 09:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 09:44 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 09:42 jayme: deployed istio-ingressgateway with additional envoy native metrics to wikikube codfw and eqiad
* 09:40 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
* 09:37 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-codfw
* 09:36 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
* 09:36 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
* 09:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 20 hosts
* 09:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 20 hosts
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35338 and previous config saved to /var/cache/conftool/dbconfig/20221004-093530-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35337 and previous config saved to /var/cache/conftool/dbconfig/20221004-092025-root.json
* 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35336 and previous config saved to /var/cache/conftool/dbconfig/20221004-090520-root.json
* 08:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: php7.2 removal
* 08:55 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: php7.2 removal
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35335 and previous config saved to /var/cache/conftool/dbconfig/20221004-085015-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35334 and previous config saved to /var/cache/conftool/dbconfig/20221004-083511-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35333 and previous config saved to /var/cache/conftool/dbconfig/20221004-082005-root.json
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35332 and previous config saved to /var/cache/conftool/dbconfig/20221004-080500-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P35331 and previous config saved to /var/cache/conftool/dbconfig/20221004-080338-root.json
* 07:52 moritzm: installing libdatetime-timezone-perl updates (catching up with latest timezone changes)
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35330 and previous config saved to /var/cache/conftool/dbconfig/20221004-074955-root.json
* 07:36 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
* 07:36 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35329 and previous config saved to /var/cache/conftool/dbconfig/20221004-072158-root.json
* 07:16 elukey: restart kafka on kafka-logging1001 to pick up its new PKI TLS cert
* 07:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
* 07:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35328 and previous config saved to /var/cache/conftool/dbconfig/20221004-070653-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35327 and previous config saved to /var/cache/conftool/dbconfig/20221004-065148-root.json
* 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35326 and previous config saved to /var/cache/conftool/dbconfig/20221004-063643-root.json
* 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 25885
* 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 25885
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35325 and previous config saved to /var/cache/conftool/dbconfig/20221004-062138-root.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35324 and previous config saved to /var/cache/conftool/dbconfig/20221004-060633-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35323 and previous config saved to /var/cache/conftool/dbconfig/20221004-055128-root.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35322 and previous config saved to /var/cache/conftool/dbconfig/20221004-053623-root.json
* 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2019-09-30 ==
== 2022-10-03 ==
* 23:28 niharika29@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/CentralNotice/resources/infrastructure/: CentralNotice: Replace deprecated editToken with csrfToken - [[phab:T233538|T233538]] (duration: 00m 57s)
* 21:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:23 AndyRussG: updated fruec from {{Gerrit|c591bd653b}} to {{Gerrit|18d89675d0}}
* 21:44 robh@cumin2002: START - Cookbook sre.dns.netbox
* 21:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
* 21:44 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 21:47 mutante: mw1290 - scap pull to get it in sync with latest deployment - it was down during scap run for [[phab:T234153|T234153]]
* 21:18 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 21:42 jforrester@deploy1001: Synchronized robots.txt: Remove old InternetArchive bot rule that's been disabled since 2008 [[phab:T7582|T7582]] (duration: 00m 57s)
* 19:41 ryankemper: [Elastic] Unbanned `elastic1066`
* 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T222539|T222539]] Drop no-op hacky disablement of MessageBlobStore::clear() (duration: 05m 13s)
* 19:37 ryankemper: [Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running
* 21:38 James_F: sync failure on mw1290.eqiad.wmnet – Connection timed out
* 19:32 robh: msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work
* 21:26 mutante: mw1290 - downtimed for onsite work on mgmt, depooled earlier
* 19:25 robh: msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet
* 21:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 19:22 ryankemper: [Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": "","_name": "elastic1066-production-search-psi-eqiad"}'`); will restart elasticsearch-psi after shards drain}}
* 21:08 XioNoX: delete BGP to AS131285 on cr1-eqsin
* 19:15 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 20:43 arlolra: Updated Parsoid to {{Gerrit|1922eb6}} ([[phab:T233459|T233459]], [[phab:T230359|T230359]], [[phab:T208070|T208070]])
* 18:48 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 20:43 arlolra: [[phab:T208070|T208070]]
* 18:41 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 20:34 arlolra@deploy1001: Finished deploy [parsoid/deploy@a6da34c]: Updating Parsoid to {{Gerrit|1922eb6}} (duration: 08m 39s)
* 18:34 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 20:25 arlolra@deploy1001: Started deploy [parsoid/deploy@a6da34c]: Updating Parsoid to {{Gerrit|1922eb6}}
* 18:30 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 20:06 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to {{Gerrit|131b83f}} (duration: 05m 55s)
* 18:30 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster
* 20:00 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to {{Gerrit|131b83f}}
* 18:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 19:15 XenoRyet: Updated payments-wiki from {{Gerrit|5193dcdfa9}} to {{Gerrit|80dead6444}}
* 18:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 17:37 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix [[phab:T234223|T234223]] (duration: 03m 03s)
* 18:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 17:33 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix [[phab:T234223|T234223]]
* 18:04 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 17:24 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix [[phab:T234223|T234223]]
* 18:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 17:18 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix [[phab:T234223|T234223]] (duration: 00m 05s)
* 17:52 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 17:18 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix [[phab:T234223|T234223]]
* 17:42 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
* 17:15 twentyafterfour@deploy1001: deploy aborted: fix [[phab:T234223|T234223]] (duration: 06m 24s)
* 17:41 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4003
* 17:10 twentyafterfour: deploy failed
* 17:41 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns4003
* 17:09 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix [[phab:T234223|T234223]]
* 17:40 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:08 twentyafterfour: deploying minor update to phatality to fix [[phab:T234223|T234223]]
* 17:37 robh@cumin2002: START - Cookbook sre.dns.netbox
* 16:35 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:29 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
* 16:34 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0aa4b4b5ab9a2e4}} (duration: 00m 57s)
* 17:29 sukhe: running homer "cr*-ulsfo*" commit "Gerrit 837727: remove dns4001 for anycast neighbors."
* 16:34 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication [[phab:T234226|T234226]] (duration: 01m 17s)
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns4001.wikimedia.org
* 16:32 krinkle@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|0aa4b4b5ab9a2e4}} (duration: 00m 57s)
* 17:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:32 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication [[phab:T234226|T234226]]
* 17:08 robh@cumin2002: START - Cookbook sre.dns.netbox
* 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns4001.wikimedia.org
* 16:25 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:49 moritzm: installing console-setup bugfixes from Buster 10.1 point release
* 16:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:46 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:46 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30781
* 15:42 moritzm: failover Ganeti master in codfw to ganeti2001
* 16:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 30781
* 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:29 moritzm: draining ganeti2007 for upcoming reboot (combined kernel/qemu security updates)
* 16:24 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] (duration: 04m 16s)
* 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:08 moritzm: draining ganeti2006 for upcoming reboot (combined kernel/qemu security updates)
* 16:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:20 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 13:54 moritzm: draining ganeti2005 for upcoming reboot (combined kernel/qemu security updates)
* 16:20 urbanecm@deploy1002: Started scap: Backport for [[gerrit:837696{{!}}throttle: Remove out of date rules]]
* 13:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cae49b85d2d780e34b553789d56d76bac4a62c48}}: throttle: Add throttle rule for 2022-10-06 ([[phab:T319212|T319212]]) (duration: 04m 21s)
* 13:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:14 sukhe: disable Puppet on cp hosts in codfw: rolling out [[phab:T309651|T309651]]
* 12:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:15 sukhe: disable Puppet on cp hosts in ulsfo: rolling out [[phab:T309651|T309651]]
* 12:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35320 and previous config saved to /var/cache/conftool/dbconfig/20221003-151438-root.json
* 12:33 kart_: Update cxserver to 2019-09-26-034732-production ([[phab:T233834|T233834]], [[phab:T232674|T232674]], [[phab:T233085|T233085]])
* 15:06 papaul: maintenance complete on mr1-esams
* 12:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35319 and previous config saved to /var/cache/conftool/dbconfig/20221003-145933-root.json
* 12:29 jbond42: offline puppetmaster2002 to reimage https://gerrit.wikimedia.org/r/c/operations/puppet/+/539322
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35318 and previous config saved to /var/cache/conftool/dbconfig/20221003-144428-root.json
* 12:27 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 14:35 sukhe: upgrade A:cp and A:drmrs to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 12:24 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 14:31 papaul: on going maintenance on mr1-esams
* 12:00 Urbanecm: EU SWAT done #2
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35317 and previous config saved to /var/cache/conftool/dbconfig/20221003-142923-root.json
* 12:00 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|3f4f242}}: New throttle rule for Czech wiki course ([[phab:T234113|T234113]]) (duration: 00m 56s)
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35316 and previous config saved to /var/cache/conftool/dbconfig/20221003-141417-root.json
* 11:57 Urbanecm: Reopen EU SWAT to deploy throttle rule for October 02 ([[phab:T234113|T234113]])
* 14:08 sukhe: upgrade cp4026, cp4032 to ATS 9.1.3-1wm2 from 9.1.3-1wm1: [[phab:T309651|T309651]]
* 11:54 raynor: EU SWAT finished
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35315 and previous config saved to /var/cache/conftool/dbconfig/20221003-135912-root.json
* 11:54 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:538296{{!}}Enable alternate mobile link for it, nl, ko wikis. (T206497)]] (duration: 00m 57s)
* 13:57 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm2_amd64.changes: [[phab:T309651|T309651]]
* 11:27 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}539517{{!}}Enable CX out of beta in Tagalog and Central Bikol WPs (T233006, T233007)]] (duration: 00m 59s)
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35314 and previous config saved to /var/cache/conftool/dbconfig/20221003-134407-root.json
* 11:20 hashar: Restarting Docker on integration-agent-puppet-docker-1001 # [[phab:T234197|T234197]]
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35313 and previous config saved to /var/cache/conftool/dbconfig/20221003-134024-root.json
* 11:08 hashar: Restarting Docker on CI agents to clear out some docker/iptables oddity # [[phab:T234197|T234197]]
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35312 and previous config saved to /var/cache/conftool/dbconfig/20221003-132902-root.json
* 10:48 hashar: CI outage is tracked in https://phabricator.wikimedia.org/T234197
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35311 and previous config saved to /var/cache/conftool/dbconfig/20221003-132519-root.json
* 10:42 moritzm: draining ganeti2004 for upcoming reboot (combined kernel/qemu security updates)
* 13:18 vgutierrez: enforcing origin-form{{!}}asterisk-form for request-target on varnish (could trigger spikes of HTTP 400 errors) - [[phab:T318676|T318676]]
* 10:40 hashar: CI down due to some DNS related failure on the hosts :-\
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35310 and previous config saved to /var/cache/conftool/dbconfig/20221003-131014-root.json
* 10:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35308 and previous config saved to /var/cache/conftool/dbconfig/20221003-125509-root.json
* 10:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35307 and previous config saved to /var/cache/conftool/dbconfig/20221003-124004-root.json
* 09:30 moritzm: uploading ferm 2.4.1+wmf2+deb9u1 for stretch-wikimedia, fixes AAAA lookups ([[phab:T153468|T153468]])
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35306 and previous config saved to /var/cache/conftool/dbconfig/20221003-122459-root.json
* 09:11 moritzm: draining ganeti2002 for upcoming reboot (combined kernel/qemu security updates)
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35305 and previous config saved to /var/cache/conftool/dbconfig/20221003-120954-root.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091:3314 for a schema change - [[phab:T233625|T233625]]', diff saved to https://phabricator.wikimedia.org/P9217 and previous config saved to /var/cache/conftool/dbconfig/20190930-091043-marostegui.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P35303 and previous config saved to /var/cache/conftool/dbconfig/20221003-120208-root.json
* 08:01 moritzm: installing e2fsprogs security updates on Stretch/Buster
* 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 07:56 marostegui: Stop dbstore1003:3311 for troubleshooting
* 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
* 06:47 moritzm: installing exim security updates on buster
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35302 and previous config saved to /var/cache/conftool/dbconfig/20221003-115449-root.json
* 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
* 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 11:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1003.eqiad.wmnet with OS buster
* 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 11:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
* 10:52 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1003.eqiad.wmnet with OS buster
* 10:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
* 10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 10:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1002.eqiad.wmnet with OS buster
* 10:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 10:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 10:39 hnowlan: starting cassandra on reimaged sessionstore1002
* 10:37 _joe_: remove stale druid.svc.eqiad.wmnet certificate from the puppetmaster CA; it was expired anyways
* 10:32 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 10:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 10:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 10:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 10:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
* 10:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1002.eqiad.wmnet with OS buster
* 10:00 hnowlan: c-foreach-nt drain on sessionstore1002
* 10:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 10:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35300 and previous config saved to /var/cache/conftool/dbconfig/20221003-092519-root.json
* 09:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31133
* 09:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31133
* 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62044
* 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62044
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35299 and previous config saved to /var/cache/conftool/dbconfig/20221003-091014-root.json
* 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P35297 and previous config saved to /var/cache/conftool/dbconfig/20221003-085840-root.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35296 and previous config saved to /var/cache/conftool/dbconfig/20221003-085509-root.json
* 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12975
* 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12975
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35295 and previous config saved to /var/cache/conftool/dbconfig/20221003-085007-root.json
* 08:40 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5001.eqsin.wmnet
* 08:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35294 and previous config saved to /var/cache/conftool/dbconfig/20221003-084004-root.json
* 08:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3303
* 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35293 and previous config saved to /var/cache/conftool/dbconfig/20221003-083729-root.json
* 08:36 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
* 08:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35292 and previous config saved to /var/cache/conftool/dbconfig/20221003-083502-root.json
* 08:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5001.eqsin.wmnet
* 08:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15557
* 08:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15557
* 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12975
* 08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12975
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35291 and previous config saved to /var/cache/conftool/dbconfig/20221003-082459-root.json
* 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30781
* 08:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30781
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35290 and previous config saved to /var/cache/conftool/dbconfig/20221003-082224-root.json
* 08:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39386
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35289 and previous config saved to /var/cache/conftool/dbconfig/20221003-081955-root.json
* 08:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39386
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35288 and previous config saved to /var/cache/conftool/dbconfig/20221003-080954-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35287 and previous config saved to /var/cache/conftool/dbconfig/20221003-080719-root.json
* 08:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35286 and previous config saved to /var/cache/conftool/dbconfig/20221003-080556-root.json
* 08:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16509
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35285 and previous config saved to /var/cache/conftool/dbconfig/20221003-080451-root.json
* 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2178', diff saved to https://phabricator.wikimedia.org/P35284 and previous config saved to /var/cache/conftool/dbconfig/20221003-075643-root.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35283 and previous config saved to /var/cache/conftool/dbconfig/20221003-075449-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35282 and previous config saved to /var/cache/conftool/dbconfig/20221003-075214-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35281 and previous config saved to /var/cache/conftool/dbconfig/20221003-075051-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35280 and previous config saved to /var/cache/conftool/dbconfig/20221003-074946-root.json
* 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16637
* 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16637
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35279 and previous config saved to /var/cache/conftool/dbconfig/20221003-073944-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35278 and previous config saved to /var/cache/conftool/dbconfig/20221003-073709-root.json
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 XioNoX: cr2-drmrs# set chassis fpc 0 sampling-instance pmacct
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35277 and previous config saved to /var/cache/conftool/dbconfig/20221003-073627-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1200', diff saved to https://phabricator.wikimedia.org/P35276 and previous config saved to /var/cache/conftool/dbconfig/20221003-073556-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35275 and previous config saved to /var/cache/conftool/dbconfig/20221003-073546-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35274 and previous config saved to /var/cache/conftool/dbconfig/20221003-073441-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35273 and previous config saved to /var/cache/conftool/dbconfig/20221003-072741-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35272 and previous config saved to /var/cache/conftool/dbconfig/20221003-072204-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35271 and previous config saved to /var/cache/conftool/dbconfig/20221003-072122-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35270 and previous config saved to /var/cache/conftool/dbconfig/20221003-072041-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35269 and previous config saved to /var/cache/conftool/dbconfig/20221003-071936-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35268 and previous config saved to /var/cache/conftool/dbconfig/20221003-071236-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35267 and previous config saved to /var/cache/conftool/dbconfig/20221003-070659-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35266 and previous config saved to /var/cache/conftool/dbconfig/20221003-070617-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35265 and previous config saved to /var/cache/conftool/dbconfig/20221003-070536-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35264 and previous config saved to /var/cache/conftool/dbconfig/20221003-070431-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2175', diff saved to https://phabricator.wikimedia.org/P35263 and previous config saved to /var/cache/conftool/dbconfig/20221003-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35262 and previous config saved to /var/cache/conftool/dbconfig/20221003-065731-root.json
* 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6128
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35261 and previous config saved to /var/cache/conftool/dbconfig/20221003-065154-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35260 and previous config saved to /var/cache/conftool/dbconfig/20221003-065112-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35259 and previous config saved to /var/cache/conftool/dbconfig/20221003-065031-root.json
* 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6128
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P35258 and previous config saved to /var/cache/conftool/dbconfig/20221003-064638-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35257 and previous config saved to /var/cache/conftool/dbconfig/20221003-064226-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35256 and previous config saved to /var/cache/conftool/dbconfig/20221003-063607-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35255 and previous config saved to /var/cache/conftool/dbconfig/20221003-063527-root.json
* 06:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11039
* 06:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11039
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35254 and previous config saved to /var/cache/conftool/dbconfig/20221003-062721-root.json
* 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5400
* 06:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5400
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35253 and previous config saved to /var/cache/conftool/dbconfig/20221003-062102-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35252 and previous config saved to /var/cache/conftool/dbconfig/20221003-062022-root.json
* 06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3300
* 06:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3300
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35251 and previous config saved to /var/cache/conftool/dbconfig/20221003-061216-root.json
* 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35250 and previous config saved to /var/cache/conftool/dbconfig/20221003-060557-root.json
* 06:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35249 and previous config saved to /var/cache/conftool/dbconfig/20221003-055711-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P35248 and previous config saved to /var/cache/conftool/dbconfig/20221003-055401-root.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35247 and previous config saved to /var/cache/conftool/dbconfig/20221003-055052-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P35246 and previous config saved to /var/cache/conftool/dbconfig/20221003-054245-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35245 and previous config saved to /var/cache/conftool/dbconfig/20221003-054206-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P35244 and previous config saved to /var/cache/conftool/dbconfig/20221003-052927-root.json


== 2019-09-28 ==
== 2022-10-02 ==
* 16:28 vgutierrez: restarting acme-chief on acmechief1001
* 08:13 elukey: `apt-get clean` on an-airflow1001 to free some space on the root partition


== 2019-09-27 ==
== 2022-10-01 ==
* 22:44 mutante: phab2001 - apt-get autoremove - remove unused python and ruby packages
* 13:24 fab@deploy1002: Finished deploy [airflow-dags/research@44a1158]: (no justification provided) (duration: 00m 08s)
* 22:36 mutante: phab2001 - upgrade php7.2 packages to 7.2.22 ([[phab:T230024|T230024]])
* 13:24 fab@deploy1002: Started deploy [airflow-dags/research@44a1158]: (no justification provided)
* 22:03 mutante: webperf1001, webperf2001: restart envoyproxy to pick up new cert with the right subject alt. names
* 13:12 fab@deploy1002: Finished deploy [airflow-dags/research@d6b3e82]: (no justification provided) (duration: 03m 35s)
* 18:22 mutante: mwdebug1001, mwdebug1002 - deleted from /srv/mediawiki/: php-1.34.0-wmf.16, .17, .18, .19 and .20 (current is .24) - usage back to about 57% ([[phab:T234063|T234063]])
* 13:08 fab@deploy1002: Started deploy [airflow-dags/research@d6b3e82]: (no justification provided)
* 18:17 mutante: mwdebug1001, mwdebug1002 - apt-get clean saves about 3GB and gets usage down from 94% to 87% on / ([[phab:T234063|T234063]])
* 16:01 XioNoX: delete BGP to AS34305 on cr2-esams
* 15:34 elukey: update pcc facts to add new hosts
* 15:02 moritzm: installing usb.ids update from Buster 10.1 point release
* 14:45 moritzm: installing ncurses bugfix update from Buster 10.1 point release
* 14:39 moritzm: installing postgresql-common bugfix update from Buster 10.1 point release
* 14:32 effie: Disable puppet and reload apache on mw* for 539465 and 539488 - [[phab:T229792|T229792]]
* 13:33 marostegui: Set candidate masters in dbctl [[phab:T234039|T234039]]
* 13:31 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:29 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 13:16 moritzm: reimaging auth1002 to buster
* 13:09 akosiaris: reboot ganeti2001 [[phab:T233906|T233906]]
* 13:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:08 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 13:03 effie: Disable puppet on mwmaint1002 to test noc.wikimedia.org with PHP7
* 12:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:48 moritzm: installing openldap security updates on Buster
* 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:37 moritzm: killing stray processes from old openjdk-8 build on boron (probably test suite not properly terminated)
* 12:30 moritzm: installing glib2.0 security updates on Buster
* 12:14 moritzm: reimaging auth2001 to buster
* 12:06 moritzm: install gnupg2 security update from Buster 10.1 point release
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 [[phab:T233625|T233625]]', diff saved to https://phabricator.wikimedia.org/P9213 and previous config saved to /var/cache/conftool/dbconfig/20190927-104914-marostegui.json
* 10:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 10:02 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: New throttle rule for Czech course ([[phab:T234024|T234024]]) (duration: 00m 59s)
* 09:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:06 moritzm: running a few ferm tests on cp1008, puppet disabled
* 07:36 godog: swift eqiad-prod: remove ms-be1027 - [[phab:T233289|T233289]]
* 05:42 XioNoX: remove tcp-mss clamping from cr2-eqiad - [[phab:T232602|T232602]]
* 05:30 XioNoX: remove tcp-mss clamping from cr2-eqord - [[phab:T232602|T232602]]
* 05:23 XioNoX: remove tcp-mss clamping from cr1-eqiad - [[phab:T232602|T232602]]
* 00:53 twentyafterfour: hotfixing phabricator fatal exception refs [[phab:T233998|T233998]]


== 2019-09-26 ==
== 2022-09-30 ==
* 22:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T211620|T211620]] Enable emails for certain notification types by default on officewiki (duration: 00m 56s)
* 23:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 22:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgPageTriageNoIndexTemplates, never read (duration: 00m 57s)
* 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 22:02 jforrester@deploy1001: Synchronized wmf-config/filebackend.php: [[phab:T228547|T228547]] Stop sharding wgFileBackends shardViaHashLevels for math-render (duration: 00m 56s)
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35243 and previous config saved to /var/cache/conftool/dbconfig/20220930-232546-ladsgroup.json
* 21:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T228547|T228547]] Stop setting wgMathFileBackend, wgMathPath, wgMathDirectory (unused) (duration: 00m 56s)
* 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35242 and previous config saved to /var/cache/conftool/dbconfig/20220930-231040-ladsgroup.json
* 21:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T228547|T228547]] Stop setting wgTexvc, wgMathTexvcCheckExecutable, wgMathCheckFiles (unused) (duration: 01m 00s)
* 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P35241 and previous config saved to /var/cache/conftool/dbconfig/20220930-225534-ladsgroup.json
* 20:53 ejegg: updated fundraising CiviCRM from {{Gerrit|52d2a24404}} to {{Gerrit|6d90d0cf06}}
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35240 and previous config saved to /var/cache/conftool/dbconfig/20220930-224027-ladsgroup.json
* 19:58 phedenskog@deploy1001: Finished deploy [performance/navtiming@1880a79]: Test deploy (duration: 00m 05s)
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
* 19:58 phedenskog@deploy1001: Started deploy [performance/navtiming@1880a79]: Test deploy
* 20:54 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
* 19:52 krinkle@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
* 18:30 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 19:52 krinkle@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
* 18:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 19:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
* 18:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 19:46 phedenskog@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
* 17:43 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 19:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.24  refs [[phab:T220749|T220749]]
* 17:24 bblack@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp4045.ulsfo.wmnet with OS bullseye
* 19:17 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (test) (duration: 00m 16s)
* 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35237 and previous config saved to /var/cache/conftool/dbconfig/20220930-170620-ladsgroup.json
* 19:17 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release (test)
* 17:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 19:13 twentyafterfour: preparing to deploy the mediawiki train for 1.34.0-wmf.24. refs [[phab:T220749|T220749]]
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 18:45 ayounsi@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 22s)
* 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35236 and previous config saved to /var/cache/conftool/dbconfig/20220930-170546-ladsgroup.json
* 18:44 ayounsi@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
* 16:54 bblack@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 18:35 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: Stop setting various static settings, now set in IS (duration: 01m 04s)
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35235 and previous config saved to /var/cache/conftool/dbconfig/20220930-165040-ladsgroup.json
* 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101) (duration: 06m 04s)
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P35234 and previous config saved to /var/cache/conftool/dbconfig/20220930-163533-ladsgroup.json
* 18:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set last static Cirrus settings directly in IS (duration: 01m 07s)
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35233 and previous config saved to /var/cache/conftool/dbconfig/20220930-162027-ladsgroup.json
* 18:29 mforns@deploy1001: Started deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101)
* 15:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 18:25 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 23s)
* 14:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 18:25 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
* 13:51 moritzm: installing puppetdb-test2001 [[phab:T318931|T318931]]
* 18:17 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop indirectly setting wgWMESearchRelevancePages (duration: 01m 04s)
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:15 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 31s)
* 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:15 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
* 13:23 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 18:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgWMESearchRelevancePages directly in InitialiseSettings (duration: 01m 04s)
* 13:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 18:07 ayounsi@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 55s)
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 18:06 ayounsi@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
* 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 18:04 mutante: running mcrouter_generate_certs to add a cert for wtp2001.codfw.wmnet for [[phab:T233654|T233654]]
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35232 and previous config saved to /var/cache/conftool/dbconfig/20220930-131638-root.json
* 18:04 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 03s)
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35231 and previous config saved to /var/cache/conftool/dbconfig/20220930-130133-root.json
* 18:04 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35230 and previous config saved to /var/cache/conftool/dbconfig/20220930-124628-root.json
* 18:03 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 42s)
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35229 and previous config saved to /var/cache/conftool/dbconfig/20220930-123123-root.json
* 18:02 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35228 and previous config saved to /var/cache/conftool/dbconfig/20220930-121618-root.json
* 17:58 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting bits of the CirrusSearch timeoutes arrays, already set in IS (duration: 01m 04s)
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35227 and previous config saved to /var/cache/conftool/dbconfig/20220930-120113-root.json
* 17:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set the whole of the CirrusSearch timeoutes arrays directly (duration: 01m 00s)
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetdb-test2001.codfw.wmnet
* 17:49 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting static values now set in InitialiseSettings (duration: 01m 04s)
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35226 and previous config saved to /var/cache/conftool/dbconfig/20220930-114605-root.json
* 17:49 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T233835|T233835]], [[phab:T233246|T233246]])
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35225 and previous config saved to /var/cache/conftool/dbconfig/20220930-113101-root.json
* 17:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move static settings from CirrusSettings-common (duration: 01m 05s)
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P35224 and previous config saved to /var/cache/conftool/dbconfig/20220930-112307-root.json
* 17:43 ppchelko@deploy1001: Finished deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events [[phab:T225211|T225211]] (duration: 02m 04s)
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetdb-test2001.codfw.wmnet on all recursors
* 17:41 ppchelko@deploy1001: Started deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events [[phab:T225211|T225211]]
* 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache puppetdb-test2001.codfw.wmnet on all recursors
* 17:35 elukey: run apt-get autoremove on stat* and notebook* to clean up old python2 deps
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:31 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T233835|T233835]], [[phab:T233246|T233246]])
* 11:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 17:14 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb-test2001.codfw.wmnet
* 17:13 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqiad
* 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35223 and previous config saved to /var/cache/conftool/dbconfig/20220930-104004-ladsgroup.json
* 17:11 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 17:08 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
* 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
* 16:40 papaul: upgrading firmware on scs-c1-codfw
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35222 and previous config saved to /var/cache/conftool/dbconfig/20220930-103943-ladsgroup.json
* 16:37 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕛☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s codfw
* 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35221 and previous config saved to /var/cache/conftool/dbconfig/20220930-102436-ladsgroup.json
* 15:56 cdanis: sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s esams
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P35220 and previous config saved to /var/cache/conftool/dbconfig/20220930-100930-ladsgroup.json
* 15:35 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s ulsfo
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35219 and previous config saved to /var/cache/conftool/dbconfig/20220930-095423-ladsgroup.json
* 15:15 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqsin
* 09:42 moritzm: installing Linux 5.10.140 updates on Bullseye hosts (released via 11.5 point release), just rollout of the package, no reboots involved
* 15:06 mforns@deploy1001: Finished deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap (duration: 02m 44s)
* 07:37 XioNoX: add RPKI ROAs for 185.71.138.0/24 and 2001:67c:930::/48
* 15:03 mforns@deploy1001: Started deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:00 cdanis: dbctl schema migration done [[phab:T229677|T229677]]
* 07:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
* 14:47 cdanis: dbctl schema migration on instances to add note field https://wikitech.wikimedia.org/wiki/Dbctl#Schema_upgrades [[phab:T229677|T229677]]
* 07:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:43 cdanis@cumin1001: dbctl commit (dc=all): 'dbctl 1.2.0 adds hostByName to the output, but it is not used by Mediawiki; this commit is the first made with the new release; no-op change', diff saved to https://phabricator.wikimedia.org/P9208 and previous config saved to /var/cache/conftool/dbconfig/20190926-144328-cdanis.json
* 07:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:41 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s cumin
* 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:37 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s puppetmaster
* 07:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
* 14:36 cdanis: ✔️ cdanis@puppetmaster1001.eqiad.wmnet ~ 🕥☕ sudo apt install python3-conftool
* 07:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52320
* 14:19 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕥☕ sudo -E reprepro -C main include jessie-wikimedia conftool_1.2.0-1+deb8u1_amd64.changes
* 07:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 52320
* 14:16 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕙☕ sudo -E reprepro -C main include buster-wikimedia conftool_1.2.0-1+deb10u1_amd64.changes ; sudo -E reprepro -C main include stretch-wikimedia conftool_1.2.0-1_amd64.changes
* 07:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:31 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='Nederlandse Leeuw' /home/urbanecm/T233922 ([[phab:T233922|T233922]])
* 07:18 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 11:23 Urbanecm: EU SWAT done
* 07:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
* 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|96bba4c}}: Add wgMinervaCustomLogos for szlwiki ([[phab:T233104|T233104]]; 3/3) (duration: 01m 05s)
* 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 11:14 Urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg ([[phab:T233104|T233104]])
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35218 and previous config saved to /var/cache/conftool/dbconfig/20220930-070454-root.json
* 11:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: {{Gerrit|96bba4c}}: Add wgMinervaCustomLogos for szlwiki ([[phab:T233104|T233104]]; 2/3) (duration: 01m 05s)
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35217 and previous config saved to /var/cache/conftool/dbconfig/20220930-065844-root.json
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|7645e55}}: Enable reader demographic surveys in English, Polish, and Russian ([[phab:T232525|T232525]]) (duration: 01m 06s)
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35216 and previous config saved to /var/cache/conftool/dbconfig/20220930-064949-root.json
* 11:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35215 and previous config saved to /var/cache/conftool/dbconfig/20220930-064339-root.json
* 11:07 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.png: SWAT: {{Gerrit|96bba4c}}: Add wgMinervaCustomLogos for szlwiki ([[phab:T233104|T233104]]; 1/3) (duration: 01m 08s)
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35214 and previous config saved to /var/cache/conftool/dbconfig/20220930-063444-root.json
* 11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35213 and previous config saved to /var/cache/conftool/dbconfig/20220930-062834-root.json
* 10:53 jbond42: reimagaing puppetmaster1002 to buster
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35212 and previous config saved to /var/cache/conftool/dbconfig/20220930-061939-root.json
* 10:48 vgutierrez: switching from nginx to ats-tls on cp5007 - [[phab:T231627|T231627]]
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35211 and previous config saved to /var/cache/conftool/dbconfig/20220930-061329-root.json
* 09:55 moritzm: bouncing postgres on puppetdb1002/2002
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35210 and previous config saved to /var/cache/conftool/dbconfig/20220930-060434-root.json
* 09:18 vgutierrez: switching from nginx to ats-tls on cp1080 - [[phab:T231433|T231433]]
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35209 and previous config saved to /var/cache/conftool/dbconfig/20220930-055824-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P9203 and previous config saved to /var/cache/conftool/dbconfig/20190926-091348-marostegui.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35208 and previous config saved to /var/cache/conftool/dbconfig/20220930-054929-root.json
* 09:04 mobrovac@deploy1001: Finished deploy [restbase/deploy@c419651]: Add nqo.wp.org - [[phab:T233833|T233833]] (duration: 21m 32s)
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35207 and previous config saved to /var/cache/conftool/dbconfig/20220930-054319-root.json
* 09:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35206 and previous config saved to /var/cache/conftool/dbconfig/20220930-053424-root.json
* 09:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35204 and previous config saved to /var/cache/conftool/dbconfig/20220930-052814-root.json
* 08:47 vgutierrez: switching from nginx to ats-tls on cp2008 - [[phab:T231433|T231433]]
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35203 and previous config saved to /var/cache/conftool/dbconfig/20220930-051919-root.json
* 08:43 mobrovac@deploy1001: Started deploy [restbase/deploy@c419651]: Add nqo.wp.org - [[phab:T233833|T233833]]
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35202 and previous config saved to /var/cache/conftool/dbconfig/20220930-051309-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9202 and previous config saved to /var/cache/conftool/dbconfig/20190926-084159-marostegui.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P35201 and previous config saved to /var/cache/conftool/dbconfig/20220930-051206-root.json
* 08:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P35200 and previous config saved to /var/cache/conftool/dbconfig/20220930-050533-root.json
* 08:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35199 and previous config saved to /var/cache/conftool/dbconfig/20220930-041937-ladsgroup.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Change special weights from 1 to 100 - [[phab:T231018|T231018]]', diff saved to https://phabricator.wikimedia.org/P9201 and previous config saved to /var/cache/conftool/dbconfig/20190926-082233-marostegui.json
* 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9200 and previous config saved to /var/cache/conftool/dbconfig/20190926-081759-marostegui.json
* 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 08:13 vgutierrez: switching from nginx to ats-tls on cp3036 - [[phab:T231433|T231433]]
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35198 and previous config saved to /var/cache/conftool/dbconfig/20220930-041916-ladsgroup.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9199 and previous config saved to /var/cache/conftool/dbconfig/20190926-081144-marostegui.json
* 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35197 and previous config saved to /var/cache/conftool/dbconfig/20220930-040409-ladsgroup.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9198 and previous config saved to /var/cache/conftool/dbconfig/20190926-080949-marostegui.json
* 03:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P35196 and previous config saved to /var/cache/conftool/dbconfig/20220930-034903-ladsgroup.json
* 08:07 elukey: executed 'rmr /yarn-rmstore/analytics-test-hadoop/ZKRMStateRoot' on conf1004's zkCli.sh to clean up znodes - [[phab:T217057|T217057]]
* 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35195 and previous config saved to /var/cache/conftool/dbconfig/20220930-033356-ladsgroup.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to change binlog format', diff saved to https://phabricator.wikimedia.org/P9197 and previous config saved to /var/cache/conftool/dbconfig/20190926-080442-marostegui.json
* 00:31 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS bullseye
* 08:02 marostegui: Depool db1078 to restart mysql to change its binlog format to ROW
* 00:22 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 07:57 vgutierrez: switching from nginx to ats-tls on cp4023 - [[phab:T231433|T231433]]
* 07:49 godog: swift eqiad-prod: continue ms-be1027 decom - [[phab:T233289|T233289]]
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:42 moritzm: draining ganeti2001 for upcoming reboot (combined kernel/qemu security updates)
* 07:41 vgutierrez: switching from nginx to ats-tls on cp5003 - [[phab:T231433|T231433]]
* 07:10 marostegui: Power off db1114 for mainboard replacement [[phab:T229452|T229452]]
* 07:09 marostegui: Stop mysql on db1114 for mainboard replacement - [[phab:T229452|T229452]]
* 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
* 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:41 marostegui: Sanitize nqowiki on db1124:3313 and db2094:3313 - [[phab:T230543|T230543]]
* 06:39 marostegui: Deploy schema change on db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 [[phab:T233625|T233625]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 [[phab:T233625|T233625]]', diff saved to https://phabricator.wikimedia.org/P9196 and previous config saved to /var/cache/conftool/dbconfig/20190926-063555-marostegui.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): ' Repool db2088:3312 db2084:3315 db2087:3316 db2086:3317 [[phab:T233625|T233625]]', diff saved to https://phabricator.wikimedia.org/P9195 and previous config saved to /var/cache/conftool/dbconfig/20190926-062922-marostegui.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1081  - [[phab:T230784|T230784]]', diff saved to https://phabricator.wikimedia.org/P9194 and previous config saved to /var/cache/conftool/dbconfig/20190926-053029-marostegui.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1081 - [[phab:T230784|T230784]]', diff saved to https://phabricator.wikimedia.org/P9193 and previous config saved to /var/cache/conftool/dbconfig/20190926-051916-marostegui.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Give some API weight to db1081 - [[phab:T230784|T230784]]', diff saved to https://phabricator.wikimedia.org/P9192 and previous config saved to /var/cache/conftool/dbconfig/20190926-050937-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db1081 - [[phab:T230784|T230784]]', diff saved to https://phabricator.wikimedia.org/P9191 and previous config saved to /var/cache/conftool/dbconfig/20190926-050722-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 [[phab:T230784|T230784]]', diff saved to https://phabricator.wikimedia.org/P9190 and previous config saved to /var/cache/conftool/dbconfig/20190926-050140-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance [[phab:T230784|T230784]]', diff saved to https://phabricator.wikimedia.org/P9189 and previous config saved to /var/cache/conftool/dbconfig/20190926-050050-marostegui.json
* 05:00 marostegui: Starting s4 failover from db1081 to db1138 - [[phab:T230784|T230784]]
* 04:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 [[phab:T230784|T230784]]', diff saved to https://phabricator.wikimedia.org/P9188 and previous config saved to /var/cache/conftool/dbconfig/20190926-041508-marostegui.json
* 04:10 marostegui: Start pre-switchover s4 steps [[phab:T230784|T230784]]


== 2019-09-25 ==
== 2022-09-29 ==
* 21:59 bblack: remove GRE MTU hacks on archiva1001 gerrit2001 cobalt install1002 - [[phab:T232602|T232602]]
* 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
* 21:58 bblack: remove GRE MTU hacks on eqiad caches (cp1xxx) - [[phab:T232602|T232602]]
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
* 21:57 bblack: remove GRE MTU hacks on esams caches (cp3xxx) - [[phab:T232602|T232602]]
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
* 21:56 bblack: remove GRE MTU hacks on eqsin caches (cp5xxx) - [[phab:T232602|T232602]]
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
* 21:10 AndyRussG: update fruec from {{Gerrit|97128874bf}} to {{Gerrit|c591bd653b}}
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
* 21:00 ejegg: updated fundraising internal dashboard from {{Gerrit|4473c65af0}}
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance


== 2019-09-24 ==
== 2022-09-28 ==
* 23:38 mutante: gerrit service restart to switch LDAP backend
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS buster
* 23:35 bstorm_: wiki-replicas depooled labsdb1011
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2037']
* 23:33 mutante: gerrit2001 - restarting gerrit service
* 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2037']
* 23:30 mutante: switching LDAP servers used by Gerrit to readonly replicas. stop using so called "labs" config for LDAP backend.
* 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35103 and previous config saved to /var/cache/conftool/dbconfig/20220928-231719-ladsgroup.json
* 22:26 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.24  refs [[phab:T220749|T220749]]
* 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 23:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 22:20 ejegg: updated fundraising CiviCRM from {{Gerrit|d31c19a0}} to {{Gerrit|f3461a44}}
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35102 and previous config saved to /var/cache/conftool/dbconfig/20220928-213701-ladsgroup.json
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35101 and previous config saved to /var/cache/conftool/dbconfig/20220928-213640-ladsgroup.json
* 21:21 ladsgroup@cumin1001


== 2019-09-23 ==
== 2022-09-27 ==
* 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 22:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1002.eqiad.wmnet with OS bullseye
* 23:50 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
* 22:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bullseye
* 23:50 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 22:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 23:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 21:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 23:43 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
* 21:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
* 23:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 21:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
* 21:32 catrope@deploy1001: Synchronized wmf-config/VariantSettings.php: Syncing no-op change for [[phab:T232419|T232419]] (duration: 00m 57s)
* 21:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bullseye
* 19:57 cdanis: [[phab:T233657|T233657]] ✔️ cdanis@cp4027.ulsfo.wmnet ~ 🕓🍵 sudo -i depool
* 21:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bullseye
* 19:16 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: {{Gerrit|2a7a125}}: Redefine hiwikisource extra namespaces ([[phab:T233365|T233365]]) (duration: 00m 57s)
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34971 and previous config saved to /var/cache/conftool/dbconfig/20220927-213028-ladsgroup.json
* 19:09 Urbanecm: Going to deploy one more last-time patch
* 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 18:51 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config, take 2 ([[phab:T233610|T233610]]) (duration: 00m 56s)
* 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 18:48 Urbanecm: Morning SWAT done
* 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34970 and previous config saved to /var/cache/conftool/dbconfig/20220927-213006-ladsgroup.json
* 18:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|37fcbdf}}: Fix: Move hiwikisource extra namespace to extra namespace section (duration: 00m 56s)
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|be2f9d4}}: Add localized Wikipedia wordmark for szlwiki ([[phab:T233104|T233104]]) (duration: 00m 55s)
* 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34969 and previous config saved to /var/cache/conftool/dbconfig/20220927-211500-ladsgroup.json
* 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: {{Gerrit|d397f5f}}: Add localized Wikipedia wordmark for szlwiki ([[phab:T233104|T233104]]) (duration: 00m 56s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|8f3f070}}: Disallow indexing discussion and user pages on eswiki ([[phab:T233562|T233562]]) (duration: 00m 56s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:21 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|6cb2042}}: New throttle rule for Wikimedia Chile editathon ([[phab:T233378|T233378]]) (duration: 00m 56s)
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:13 Urbanecm: Security deploy for [[phab:T207094|T207094]]
* 21:12 TheresNoTime: closing UTC late backport window
* 18:03 gilles: [[phab:T233095|T233095]] Purge articles for all wikis: foreachwiki maintenance/purgeList.php --all --verbose
* 21:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] (duration: 04m 53s)
* 17:59 gilles@deploy1001: Synchronized php-1.34.0-wmf.23/maintenance/purgeList.php: [[phab:T233095|T233095]] Make purgeList.php use getCdnUrls() (duration: 00m 56s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config ([[phab:T233610|T233610]]) (duration: 00m 58s)
* 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:53 elukey@deploy1001: Finished deploy [analytics/refinery@b99647e]: (no justification provided) (duration: 07m 24s)
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:46 elukey@deploy1001: Started deploy [analytics/refinery@b99647e]: (no justification provided)
* 21:06 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 16:33 Urbanecm: Remove my temporary adminship on bgwikinews ([[phab:T233322|T233322]])
* 21:06 samtar@deploy1002: Started scap: Backport for [[gerrit:835593{{!}}Remove figures from text extracts (T318727)]]
* 16:29 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: {{Gerrit|84afa44}}: Close bgwikinews, but allow sysops to edit ([[phab:T233322|T233322]]; 2/2) (duration: 00m 56s)
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] (duration: 06m 58s)
* 16:27 urbanecm@deploy1001: Synchronized dblists/closed.dblist: {{Gerrit|84afa44}}: Close bgwikinews, but allow sysops to edit ([[phab:T233322|T233322]]; 1/2) (duration: 00m 58s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:26 Urbanecm: mwscript createAndPromote.php --wiki=bgwikinews --sysop --force 'Martin Urbanec' - temporary ([[phab:T233322|T233322]])
* 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P34968 and previous config saved to /var/cache/conftool/dbconfig/20220927-205953-ladsgroup.json
* 13:21 moritzm: installing qemu security updates on remaining cloudvirt hosts
* 20:59 TheresNoTime: extending UTC late backport window
* 12:40 moritzm: rolling restart of graphoid on scb to pick up expat security update
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:05 moritzm: restarting apache on bast5001 to pick up expat security update
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:50 moritzm: restarting Apache/HHVM/PHP on mw1261-mw1265 after Expat security update
* 20:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:42 vgutierrez: switching cp4027 from nginx to ats-tls - [[phab:T231627|T231627]]
* 20:58 samtar@deploy1002: samtar and ssastry: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:35 moritzm: installing expat security updates
* 20:58 samtar@deploy1002: Started scap: Backport for [[gerrit:835594{{!}}Remove figures from text extracts (T318727)]]
* 11:33 awight: EU SWAT finished
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:31 awight@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/FileImporter: SWAT: [[gerrit:538566{{!}}Add change tags to all FileImport text revisions (T227849)]] (duration: 00m 57s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:23 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: [[gerrit:538577{{!}}Set item terms on write both up to Q40Mio (T225055)]] (duration: 00m 55s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:12 effie: Disable puppet and rolling restart of php7.2-fpm on  mw[1321-1333] - [[phab:T219150|T219150]]
* 20:53 samtar@deploy1002: Finished scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] (duration: 05m 29s)
* 11:11 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: [[gerrit:538293{{!}}Add localized logos for the Zulu Wikipedia (T233424)]] (duration: 00m 56s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:06 awight@deploy1001: Synchronized static/images/project-logos: SWAT: [[gerrit:538291{{!}}Add localized logos for the Zulu Wikipedia (T233424)]] (duration: 00m 57s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:05 moritzm: uploaded openjdk 8u222-b10-1~deb10u1 to buster-wikimedia/component/jdk8 (bootstrap build, second boron build following) [[phab:T233604|T233604]]
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:538591{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 20:48 samtar@deploy1002: samtar and stang: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:538591{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 20:48 samtar@deploy1002: Started scap: Backport for [[gerrit:835681{{!}}romdwikimedia: Enable subpages in NS0 (T318491)]]
* 09:51 jynus: stopping db2102 mariadb to recover db
* 20:46 samtar@deploy1002: Finished scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] (duration: 05m 14s)
* 09:45 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'نعنوعه' 'مريانا_علي' ([[phab:T233585|T233585]])
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:44 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bnwiki --logwiki=metawiki 'Huangzonghao' 'HUANGZONGHAO' ([[phab:T233585|T233585]])
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host mc-wf1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:38 akosiaris: [[phab:T218184|T218184]] upload to apt.wikimedia.org/jessie-wikimedia apertium-dan-nor_1.4.0-1+wmf1, apertium-nno-nob_1.2.0-1+wmf1, apertium-swe-dan_0.8.0-2+wmf1, apertium-swe-nor_0.3.0-2+wmf1
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34967 and previous config saved to /var/cache/conftool/dbconfig/20220927-204446-ladsgroup.json
* 09:02 effie: Disable puppet and rolling restart php-fpm on mw[1312-1317,1339-1347]* - [[phab:T219150|T219150]]
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:31 elukey@deploy1001: Finished deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes (duration: 07m 26s)
* 20:41 samtar@deploy1002: samtar and ryankemper: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 08:24 elukey@deploy1001: Started deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes
* 20:41 samtar@deploy1002: Started scap: Backport for [[gerrit:833860{{!}}elastic: rebalance enwiki_content shard counts (T318270)]]
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9148 and previous config saved to /var/cache/conftool/dbconfig/20190923-082119-marostegui.json
* 20:38 samtar@deploy1002: Finished scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] (duration: 06m 02s)
* 07:41 godog: swift run swiftrepl without deletes eqiad -> codfw
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:40 godog: swift eqiad-prod: continue ms-be1027 decom - [[phab:T233289|T233289]]
* 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9147 and previous config saved to /var/cache/conftool/dbconfig/20190923-073044-marostegui.json
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9146 and previous config saved to /var/cache/conftool/dbconfig/20190923-071537-marostegui.json
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:08 marostegui: Stop MySQL on db1123 to reboot to change binlog format and kernel - [[phab:T230783|T230783]]
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 to change binlog format [[phab:T230783|T230783]]', diff saved to https://phabricator.wikimedia.org/P9145 and previous config saved to /var/cache/conftool/dbconfig/20190923-070628-marostegui.json
* 20:33 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1123 and db1078 roles, db1078 will serve logpager and recentchanges, db1123 will just serve general traffic', diff saved to https://phabricator.wikimedia.org/P9144 and previous config saved to /var/cache/conftool/dbconfig/20190923-065056-marostegui.json
* 20:32 samtar@deploy1002: Started scap: Backport for [[gerrit:835689{{!}}Add wmgMFDefaultEditor back in for future use]]
* 05:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1066 from config [[phab:T233071|T233071]] (duration: 00m 56s)
* 20:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1066 from config [[phab:T233071|T233071]] (duration: 01m 15s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 samtar@deploy1002: Started scap: Backport for [[gerrit:835206{{!}}Disable MobileFrontend default editor a/b test (T302356)]]
* 20:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] (duration: 04m 58s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:15 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 20:15 samtar@deploy1002: Started scap: Backport for [[gerrit:835648{{!}}Enable DiscussionTools reply button visual enhancements on cswiki+huwiki (T315626)]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] (duration: 05m 46s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:04 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:835635{{!}}MobileWebUIActions sample rate to 1 on testwiki (T302108)]]
* 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34966 and previous config saved to /var/cache/conftool/dbconfig/20220927-194908-ladsgroup.json
* 19:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 19:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:02 brennen: 1.40.0-wmf.3 ([[phab:T314192|T314192]]) no current blockers, promoting to group0
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
* 17:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1002.eqiad.wmnet
* 17:49 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 17:48 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 17:47 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 17:47 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 17:39 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
* 17:38 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
* 17:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:29 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 17:28 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest[1001-1002].eqiad.wmnet
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1003.eqiad.wmnet
* 17:08 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 14:56 mforns@deploy1002: Finished deploy [airflow-dags/analytics@25dda27]: (no justification provided) (duration: 00m 11s)
* 14:56 mforns@deploy1002: Started deploy [airflow-dags/analytics@25dda27]: (no justification provided)
* 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34958 and previous config saved to /var/cache/conftool/dbconfig/20220927-143831-ladsgroup.json
* 14:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host logstash2036.codfw.wmnet with OS buster
* 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34957 and previous config saved to /var/cache/conftool/dbconfig/20220927-143109-ladsgroup.json
* 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34956 and previous config saved to /var/cache/conftool/dbconfig/20220927-143047-ladsgroup.json
* 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS buster
* 14:25 Lucas_WMDE: END lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]], 710183 rows done
* 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34955 and previous config saved to /var/cache/conftool/dbconfig/20220927-142324-ladsgroup.json
* 14:23 mforns@deploy1002: Finished deploy [airflow-dags/analytics@66dfa44]: (no justification provided) (duration: 00m 46s)
* 14:22 mforns@deploy1002: Started deploy [airflow-dags/analytics@66dfa44]: (no justification provided)
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34954 and previous config saved to /var/cache/conftool/dbconfig/20220927-141541-ladsgroup.json
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 Lucas_WMDE: BEGIN lucaswerkmeister-wmde@mwmaint1002:~$ PHP=php7.4 mwscript updateCollation.php incubatorwiki --force # [[phab:T315552|T315552]]
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P34953 and previous config saved to /var/cache/conftool/dbconfig/20220927-140817-ladsgroup.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 taavi@deploy1002: Finished scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] (duration: 06m 59s)
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P34952 and previous config saved to /var/cache/conftool/dbconfig/20220927-140034-ladsgroup.json
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 taavi@deploy1002: taavi and migr: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:59 taavi@deploy1002: Started scap: Backport for [[gerrit:835590{{!}}Track use of Searchbox footer on Wikidata (T306933)]], [[gerrit:835591{{!}}Track use of Searchbox footer on Wikidata (T306933)]]
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34951 and previous config saved to /var/cache/conftool/dbconfig/20220927-135310-ladsgroup.json
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34950 and previous config saved to /var/cache/conftool/dbconfig/20220927-134528-ladsgroup.json
* 12:42 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 12:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:28 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:26 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:23 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:18 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 12:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 jbond: upload new wmf-laptop_0.5.4 package
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:58 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:57 mvernon@cumin1001: START - Cookbook sre.dns.netbox
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2028-2039].codfw.wmnet
* 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:38 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:14 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2028-2039].codfw.wmnet
* 10:11 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:11 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:10 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:06 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1028-1033,1035-1039].eqiad.wmnet
* 10:03 moritzm: rebalance ganeti/codfw row D after completed Bullseye update [[phab:T311686|T311686]]
* 09:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:13 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 09:12 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34942 and previous config saved to /var/cache/conftool/dbconfig/20220927-082023-ladsgroup.json
* 08:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34941 and previous config saved to /var/cache/conftool/dbconfig/20220927-082001-ladsgroup.json
* 08:15 moritzm: restarting apache/FPM on mw canaries to pick up Expat security updates
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34938 and previous config saved to /var/cache/conftool/dbconfig/20220927-080454-ladsgroup.json
* 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-eqiad
* 07:58 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-eqiad
* 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 07:54 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 07:52 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin1001 - [[phab:T310745|T310745]]
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P34937 and previous config saved to /var/cache/conftool/dbconfig/20220927-074948-ladsgroup.json
* 07:49 XioNoX: upgrade python3-pynetbox to 6.6.0 on cumin2002 - [[phab:T310745|T310745]]
* 07:48 moritzm: installing expat security updates on stretch/buster/bullseye
* 07:39 moritzm: uploaded expat 2.2.0-2+deb9u5+wmf1 to apt.wikimedia.org/stretch-wikimedia
* 07:36 jayme: published image docker-registry.discovery.wmnet/golang1.18:1.18-1
* 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34936 and previous config saved to /var/cache/conftool/dbconfig/20220927-073523-ladsgroup.json
* 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34935 and previous config saved to /var/cache/conftool/dbconfig/20220927-073451-ladsgroup.json
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34934 and previous config saved to /var/cache/conftool/dbconfig/20220927-073441-ladsgroup.json
* 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34933 and previous config saved to /var/cache/conftool/dbconfig/20220927-071938-ladsgroup.json
* 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P34932 and previous config saved to /var/cache/conftool/dbconfig/20220927-070431-ladsgroup.json
* 06:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 8220
* 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 8220
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34930 and previous config saved to /var/cache/conftool/dbconfig/20220927-064925-ladsgroup.json
* 05:28 marostegui: Install 10.6.10 on db1124, db1125, pc1014, pc2014 [[phab:T318128|T318128]]
* 03:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.1 (duration: 02m 03s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 36m 01s)
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34928 and previous config saved to /var/cache/conftool/dbconfig/20220927-020124-ladsgroup.json
* 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34927 and previous config saved to /var/cache/conftool/dbconfig/20220927-020103-ladsgroup.json
* 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34926 and previous config saved to /var/cache/conftool/dbconfig/20220927-014556-ladsgroup.json
* 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34925 and previous config saved to /var/cache/conftool/dbconfig/20220927-013050-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2019-09-22 ==
== 2022-09-26 ==
* off: marostegui set s3 master RW
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:31 TheresNoTime: closing UTC late backport window
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 13:56 moritzm: installing mako security updates
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:51 moritzm: installing bind9 security updates on Bullseye
* 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:51 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] (duration: 06m 05s)
* 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 12:44 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]]
* 12:25 moritzm: installing unzip security updates
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
* 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|033ab75917932a6b6e1cda8cc26f5f069448e3b9}}: arwiki: Properly grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 46s)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:56 btullis: adding 80GB of virtual disk to matomo1002
* 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a5486780a0543d7fb1c637d2abe48855e753d13}}: arwiki: Grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:07 godog: upgrade grafana to 8.5.13
* 08:04 godog: add 20G to prometheus/analytics in codfw
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:31 oblivian@deploy1002: Finished scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] (duration: 05m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:26 oblivian@deploy1002: Started scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]]
* 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|620bb80e3534c812d7f4de25547d92104b8609a0}}: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and  shi to InterwikiSortOrders (duration: 03m 40s)
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]) (duration: 03m 46s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]; 1/2) (duration: 03m 40s)
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
* 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2d2c08fc6e0dd5c0c85fbe31f85201721871aa9}}: eswiki: Enable structured mentor list ([[phab:T310905|T310905]]) (duration: 04m 30s)
* 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2019-09-21 ==
== 2022-09-25 ==
* 05:42 shdubsh: re-enable input-kafka-rsyslog-shipper in codfw
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 05:33 shdubsh: drop input-kafka-rsyslog-shipper in codfw
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 02:15 bblack: dbproxy1017: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 02:14 bblack: dbproxy1016: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 01:52 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash2004-5-6
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 01:34 mutante: restarting mobileapps service on scb*
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 01:34 mutante: restarted mobileapps service on scb1001
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1088.eqiad.wmnet
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 01:21 bblack: re-pooling cp108[78] in D2 via confctl
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 01:14 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash1007
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 01:08 shdubsh: removed input-kafka-rsyslog-shipper-eqiad/codfw from logstash inputs logstash1008 and logstash1009
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 00:54 mutante: aqs1009 - systemctl restart aqs
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 00:54 mutante: aqs1006 - systemctl restart aqs
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 00:48 mutante: aqs1005 - systemctl restart aqs
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 00:46 shdubsh: restarting logstash on logstash1008 without udp-localhost-eqiad/codfw configs
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 00:39 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1088.eqiad.wmnet
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 00:38 bblack: depooling confctl things in rack D2
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 00:38 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet


== 2019-09-20 ==
== 2022-09-23 ==
* 21:30 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: fix [[phab:T233453|T233453]] (duration: 00m 56s)
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 21:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: fix [[phab:T233453|T233453]] (duration: 00m 58s)
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 19:26 XioNoX: update eqsin firewall filters - [[phab:T233268|T233268]]
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 16:35 krinkle@deploy1001: Synchronized vendor/: {{Gerrit|ead70240892e9}} (duration: 00m 59s)
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 16:14 XioNoX: update eqiad firewall filters - [[phab:T233268|T233268]]
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 16:11 XioNoX: update esams firewall filters - [[phab:T233268|T233268]]
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:17 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bgwiki --logwiki=metawiki 'Newrdkter' 'NRdk' ([[phab:T233313|T233313]])
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:03 XioNoX: remove AS-PATH prepending in ams
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:29 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 11:16 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 11:15 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 10:17 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 09:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 09:31 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
* 09:31 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 09:30 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
* 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 09:30 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 08:52 jynus: creating new database on m1 "bacula9" [[phab:T229209|T229209]]
* 08:28 hashar: Killed zuul-server process on contint2001 which was establishing connections to Gerrit and filling the pool of allowed ssh connections # [[phab:T233390|T233390]]
* 08:23 hashar: CI in default since it is somehow no more able to fetch from Gerrit [[phab:T233390|T233390]]
* 08:20 hashar: contint1001: upgrade zuul to 2.5.1-wmf10 # [[phab:T203846|T203846]]
* 08:12 hashar: contint2001: upgrade zuul to 2.5.1-wmf10 # [[phab:T203846|T203846]]
* 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
* 07:46 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
* 07:45 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
* 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:14 godog: eqiad-prod: start ms-be1027 decom - [[phab:T233289|T233289]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from logpager and contributions after testing, repool back with normal weight on main traffic [[phab:T223151|T223151]]', diff saved to https://phabricator.wikimedia.org/P9136 and previous config saved to /var/cache/conftool/dbconfig/20190920-052902-marostegui.json
* 05:27 marostegui: Analyze table enwiki.logging on db2102 - [[phab:T223151|T223151]]
* 05:07 marostegui: Remove temporary index on hiwikisource views [[phab:T219374|T219374]]
* 01:06 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC (duration: 02m 51s)
* 01:05 jforrester@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/TimedMediaHandler/: [[phab:T233360|T233360]] Fix Safari 13.0 regression in video playback with audio (duration: 00m 58s)
* 01:03 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC


== 2019-09-19 ==
== 2022-09-22 ==
* 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 22:51 ejegg: updated payments-wiki from {{Gerrit|adef0e858f}} to {{Gerrit|939b771800}}
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:34 mutante: gerrit1001 - stopping puppet, removing gerrit IP from interface, rebooting
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:37 niharika29@deploy1001: Synchronized wmf-config/VariantSettings.php: Enable special:mute on testwiki; [[phab:T231577|T231577]] (duration: 00m 56s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:15 XioNoX: push firewall policies to pfw3-eqiad - [[phab:T233325|T233325]]
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:07 XioNoX: push firewall policies to pfw3-codfw - [[phab:T233325|T233325]]
* 21:23 dancy@deploy1002: backport aborted: (duration: 00m 05s)
* 19:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.23  refs [[phab:T220748|T220748]]
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 twentyafterfour: There are currently no blockers for [[phab:T220748|T220748]] so I am preparing to deploy 1.34.0-wmf.23 to all wikis.
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:14 XioNoX: add TCP-MSS 1436 to cr2-eqiad external interfaces - [[phab:T232602|T232602]]
* 20:55 brennen: end of utc late backport & config window
* 18:12 XioNoX: add TCP-MSS 1436 to cr1-eqiad external interfaces - [[phab:T232602|T232602]]
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:01 bblack: lvs2004 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 17:55 mutante: puppetmaster1001 - add mcrouter cert for mw1298.eqiad.wmnet ([[phab:T192457|T192457]])
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 17:52 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 17:48 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: [[phab:T156095|T156095]], {{Gerrit|32cf50453cd}} (duration: 01m 04s)
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:47 arlolra@deploy1001: Finished deploy [parsoid/deploy@77630c5]: Updating Parsoid to {{Gerrit|6bf23c2}} (duration: 08m 52s)
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 17:43 Krinkle: Move whisper/MediaWiki/wanobjectcache/revision_row_1/29 to whisper/MediaWiki/wanobjectcache/revision_row_1_29 on graphite1004 and graphite2003 ([[phab:T232907|T232907]])
* 20:36 brennen@deploy1002: backport aborted:  (duration: 02m 16s)
* 17:38 arlolra@deploy1001: Started deploy [parsoid/deploy@77630c5]: Updating Parsoid to {{Gerrit|6bf23c2}}
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:27 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:27 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/includes/libs/objectcache/wancache: {{Gerrit|2e910c9d3f8c04f7db}}, [[phab:T232907|T232907]] (duration: 01m 03s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:23 bblack: lvs2005 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:19 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:16 bblack: lvs200[456] - puppet disabled for https://gerrit.wikimedia.org/r/536324 deploy/test
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:14 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@69b3737]: Update mobileapps to {{Gerrit|cfc3062}} (duration: 05m 42s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:08 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@69b3737]: Update mobileapps to {{Gerrit|cfc3062}}
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:31 _joe_: removed manually the purge_checkuser cron from mwmaint1002, to have puppet recreate it
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 16:20 ejegg: updated fundraising CiviCRM from {{Gerrit|90db6cb5a1}} to {{Gerrit|5def62ab05}}
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 16:15 papaul: shutting down scs-a1-codfw for replacement
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 15:26 moritzm: repooling restbase2012 after completed Cassandra bootstrap [[phab:T224553|T224553]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:25 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=restbase,service=cassandra,dc=codfw,name=restbase2012.codfw.wmnet
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-backend,dc=codfw,name=restbase2012.codfw.wmnet
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-ssl,dc=codfw,name=restbase2012.codfw.wmnet
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase,dc=codfw,name=restbase2012.codfw.wmnet
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 15:06 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 15:05 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 18:38 dancy@deploy1002: Started scap: testing
* 14:56 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) ([[phab:T229286|T229286]]) (duration: 05m 39s)
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 14:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) ([[phab:T229286|T229286]])
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 14:47 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3 (duration: 10m 42s)
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 14:37 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 14:36 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2 (duration: 08m 24s)
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 14:31 mobrovac: bootstrap restbase2012-c -- [[phab:T224553|T224553]]
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 14:28 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 14:28 mobrovac@deploy1001: deploy aborted: Remove the TID suffix in the ETag, if present - [[phab:T230272|T230272]] (duration: 11m 20s)
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 14:28 sbassett: Deployed security patch for [[phab:T224203|T224203]] (php-1.34.0-wmf.23)
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:19 sbassett: Deployed security patch for [[phab:T224203|T224203]]
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:19 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:18 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:17 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present - [[phab:T230272|T230272]]
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:54 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API ([[phab:T216750|T216750]]) (duration: 03m 06s)
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:51 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API ([[phab:T216750|T216750]])
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/Translate: [[phab:T233308|T233308]] (duration: 01m 07s)
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 moritzm: powercycling mw1300
* 16:39 dancy@deploy1002: Sync cancelled.
* 13:12 mobrovac: bootstrap restbase2012-b -- [[phab:T224553|T224553]]
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1089 into contributions service [[phab:T223151|T223151]]', diff saved to https://phabricator.wikimedia.org/P9133 and previous config saved to /var/cache/conftool/dbconfig/20190919-130848-marostegui.json
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 13:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - [[phab:T224553|T224553]] (duration: 21m 38s)
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:39 mobrovac@deploy1001: Started deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - [[phab:T224553|T224553]]
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:36 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:48 mobrovac: bootstrap restbase2012-a -- [[phab:T224553|T224553]]
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:32 Urbanecm: EU SWAT done
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|199a05c}}: Add new throttle rule for Czech wiki course ([[phab:T233199|T233199]]) (duration: 01m 01s)
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|eab7c6a}}: {{Gerrit|c80f026}}: GrowthExperiments: GrowthExperiments: Enable Special:Homepage for euwiki, GrowthExperiments: Enable help panel for euwiki ([[phab:T233066|T233066]], [[phab:T233065|T233065]]) (duration: 01m 05s)
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:54 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: security [[phab:T207094|T207094]] (duration: 01m 02s)
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:53 urbanecm@deploy1001: sync-file aborted: security [[phab:T207094|T207094]] (duration: 00m 28s)
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 09:51 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: security [[phab:T207094|T207094]] (duration: 01m 05s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:22 godog: power back on ms-be1027, found with power off
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:31 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|393441b}}: Change configuration of AbuseFilter extension for enwikisource ([[phab:T231750|T231750]]) (duration: 01m 04s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:22 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 08:21 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: revert [[phab:T207094|T207094]] (duration: 01m 04s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:20 jynus@cumin1001: START - Cookbook sre.hosts.downtime
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 08:14 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: security [[phab:T207094|T207094]] (duration: 01m 06s)
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 08:11 marostegui: Rename tables on db1133:labspuppet [[phab:T233281|T233281]]
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 07:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 07:40 moritzm: rebooting failoid1001 for kernel update
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give more logpager weight to db1089 [[phab:T223151|T223151]]', diff saved to https://phabricator.wikimedia.org/P9131 and previous config saved to /var/cache/conftool/dbconfig/20190919-072234-marostegui.json
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:01 moritzm: reimaging restbase2012 to stretch [[phab:T224553|T224553]]
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 06:18 marostegui: Sanitize hiwikisource on db1124:3313 and db2094:3313 [[phab:T219374|T219374]]
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Temporarily pool db1089 into enwiki logpager [[phab:T223151|T223151]]', diff saved to https://phabricator.wikimedia.org/P9130 and previous config saved to /var/cache/conftool/dbconfig/20190919-060440-marostegui.json
* 05:11 marostegui: Stop MySQL on db2055 for decommission [[phab:T233186|T233186]]
* 05:11 marostegui: Remove db2055 from tendril and zarcillo [[phab:T233186|T233186]]


== 2019-09-18 ==
== 2022-09-21 ==
* 23:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/MobileFrontend/resources/dist/: [[phab:T233260|T233260]], {{Gerrit|1667ed957a19067}} (duration: 01m 04s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:58 cmjohnson1: enabled asw2-c-eqiad interface xe-2/0/45
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:40 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/resources/Resources.php: {{Gerrit|d6dadfdb0b237c918}} (duration: 01m 03s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: [[phab:T156095|T156095]], {{Gerrit|ff44043efa59e9}} (duration: 01m 05s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:13 cmjohnson1: disabling asw2-c-eqiad xe-2/0/45 - cr1-eqiad to replace optic [[phab:T233265|T233265]]
* 20:46 tgr_: UTC late deploys done
* 21:54 gilles: [[phab:T233095|T233095]] Purging all eswiki articles (both desktop and mobile this time)
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:53 gilles@deploy1001: Synchronized php-1.34.0-wmf.22/maintenance/purgeList.php: [[phab:T233095|T233095]] Make purgeList.php use getCdnUrls() (duration: 01m 04s)
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 21:13 XioNoX: enable damping on primary codfw-eqiad link - [[phab:T196432|T196432]]
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:09 XioNoX: enable damping on codfw-ulsfo link - [[phab:T196432|T196432]]
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No longer load InitialiseSettings at all in CommonSettings (duration: 01m 03s)
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Quick fix for wmfLoadInitialiseSettings() (duration: 01m 03s)
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 20:40 jforrester@deploy1001: scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:23 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out call to InitialiseSettings.php (duration: 01m 04s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:18 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:18 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Variant configuration: Drop suport for serialised PHP (duration: 01m 04s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Never write to serialised PHP [[phab:T223602|T223602]] (duration: 01m 04s)
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 20:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:07 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T208246|T208246]] Enforce a 10-byte password for privileged users (duration: 01m 04s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:57 urandom: decommissioning Cassandra, restbase2012-c -- [[phab:T224553|T224553]]
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:42 gilles: [[phab:T233095|T233095]] Purging all pages on eswiki
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 19:27 joal@deploy1001: Finished deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix  (duration: 03m 40s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:24 mutante: ganeti1001 - deleting krypton.eqiad.wmnet - decom  [[phab:T231546|T231546]]
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 19:23 joal@deploy1001: Started deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.23  refs [[phab:T220748|T220748]] (duration: 01m 04s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.23  refs [[phab:T220748|T220748]]
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 19:07 twentyafterfour: There appear to be no blockers on [[phab:T220748|T220748]] so I'll proceed with deploying 1.34.0-wmf.23 to group 1.
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:01 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix (duration: 02m 12s)
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 18:59 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 18:55 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train (duration: 01m 05s)
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 18:54 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:46 XioNoX: remove `border-in4 term ddos-0906` from all routers
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:53 Amir1: Creating hiwikisource is done
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:50 urandom: decommissioning Cassandra, restbase2012-b -- [[phab:T224553|T224553]]
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:48 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 32s)
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 17:45 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Add hiwikisource logos ([[phab:T218155|T218155]]) (duration: 01m 04s)
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:43 ladsgroup@deploy1001: Synchronized wmf-config/VariantSettings.php: Add hiwikisource ([[phab:T218155|T218155]]) (duration: 01m 05s)
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add hiwikisource ([[phab:T218155|T218155]]) (duration: 01m 04s)
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:38 Amir1: manual write on hiwikisource "wikiadmin@10.64.0.205(hiwikisource)> update text set old_text = 'DB://cluster25/1';" ([[phab:T218155|T218155]])
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:33 Amir1: mwscript maintenance/createAndPromote.php --wiki=hiwikisource --force --sysop Ladsgroup ([[phab:T218155|T218155]])
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 17:28 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 17:22 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 06s)
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 17:22 Jeff_Green: authdns-update to deploy DNS for new fundraising host
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 17:03 mutante: ganeti2004 - resetting DRAC in an attempt to make IPMI work again
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 17:00 Urbanecm: Morning SWAT done
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 16:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: [[:gerrit:537691{{!}}Enable DNS blacklist on testwiki temporarily]] ([[phab:T230822|T230822]]) (duration: 01m 03s)
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:43 Urbanecm: {{Gerrit|8340be9}} sync is for [[phab:T230822|T230822]], mistakenly inserted `test` instead of the task number
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:42 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:42 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|8340be9}}: Enable logging for BlockManager channel at info level (test) (duration: 01m 04s)
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:36 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 16:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|dc1298d}}: Add Draft and Draft_talk aliases for wikis that define draft namespace ([[phab:T223472|T223472]]) (duration: 01m 02s)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|6e59651}}: Disable FundraiserLandingPage extension on test.wikipedia.org ([[phab:T203020|T203020]]) (duration: 01m 04s)
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 16:26 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/tewikisource.png ([[phab:T232065|T232065]])
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:25 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|7c987fc}}: Change Telugu Wikisource Logo ([[phab:T232065|T232065]]; 2/2) (duration: 01m 06s)
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|7c987fc}}: Change Telugu Wikisource Logo ([[phab:T232065|T232065]]; 1/2) (duration: 01m 05s)
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:18 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|817d679}}: Turn on EventLogging at 100% for DonateWiki ([[phab:T233145|T233145]]) (duration: 01m 04s)
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 16:05 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|ba30276}}: Add suppressredirect right to filemovers on bnwiki ([[phab:T233137|T233137]]) (duration: 01m 05s)
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:55 moritzm: repooling restbase2011 after reimage/bootstrap
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:53 urandom: decommissioning Cassandra, restbase2012-a -- [[phab:T224553|T224553]]
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:59 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-backend
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 14:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:52 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:50 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:41 joal@deploy1001: Finished deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train (duration: 05m 28s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 13:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 joal@deploy1001: Started deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:11 hashar: Restarting Jenkins, starting Zuul
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:56 marostegui: Deploy schema change on the following s6 hosts: db1088, db1093, db1096, db1098, db1139, dbstore1005 - [[phab:T231172|T231172]]
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:52 hashar: gracefully stopping Zuul (kill SIGUSR1) to prepare for Jenkins restart
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 12:40 marostegui: Deploy schema change on s6 codfw master with replication [[phab:T231172|T231172]]
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:18 vgutierrez: restarting ats-tls to avoid spreading Proxy-Connection header - [[phab:T233205|T233205]]
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:03 marostegui: Stop haproxy on dbproxy1006 - [[phab:T233207|T233207]]
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:29 mobrovac: bootstrap restbase2011-c -- [[phab:T224553|T224553]]
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:27 awight: EU SWAT complete
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:27 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: [[gerrit:537586{{!}}Enable FileImport source wiki editing (T228851)]] (duration: 00m 59s)
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 11:25 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:537586{{!}}Enable FileImport source wiki editing (T228851)]] (duration: 01m 03s)
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:537375{{!}}NowCommons test & test2wiki configuration (T228851)]] (duration: 01m 15s)
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 10:17 onimisionipe: force relocation of shards  for eqiad search(chi) cluster
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:16 moritzm: restarting postgres on puppetdb1002/2002 after updating permissions for replication user
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:00 mobrovac: bootstrap restbase2011-b -- [[phab:T224553|T224553]]
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:37 godog: run swiftrepl eqiad -> codfw on all containers, no deletes
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul
* 09:37 effie: upgrading netmon* to PHP 7.2.22 [[phab:T230024|T230024]]
* 09:35 godog: run swiftrepl eqiad -> codfw for transcoded containers
* 08:59 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:57 jynus@cumin1001: START - Cookbook sre.hosts.downtime
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9125 and previous config saved to /var/cache/conftool/dbconfig/20190918-085721-marostegui.json
* 08:22 mobrovac: bootstrap restbase2011-a -- [[phab:T224553|T224553]]
* 07:43 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 07:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:43 moritzm: reimaging restbase2011 to stretch [[phab:T224553|T224553]]
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P9124 and previous config saved to /var/cache/conftool/dbconfig/20190918-060401-marostegui.json
* 05:58 marostegui: Deploy schema change on db2097:3316 - [[phab:T233135|T233135]]
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool host after onsite checks [[phab:T233184|T233184]]', diff saved to https://phabricator.wikimedia.org/P9123 and previous config saved to /var/cache/conftool/dbconfig/20190918-054755-marostegui.json
* 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2055 from config [[phab:T233186|T233186]] (duration: 01m 04s)
* 05:31 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2055 from config [[phab:T233186|T233186]] (duration: 01m 06s)
* 05:03 marostegui: Start MySQL on db2127 [[phab:T233184|T233184]]
* 03:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.util/: {{Gerrit|0333729e}}, {{Gerrit|ccfe88241}} (duration: 01m 07s)


== 2019-09-17 ==
== 2022-09-20 ==
* 23:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.23  refs [[phab:T220748|T220748]]
* 20:19 cjming: end of UTC late backport window
* 23:20 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/VisualEditor/extension.json: {{Gerrit|aae62a87be3c954378b07dfb881f79a4f73c5def}} (duration: 01m 05s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 22:43 dzahn@cumin1001: Updating IPMI password on 6 hosts - dzahn@cumin1001
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add comment about MinimumPasswordLengthToLogin (duration: 01m 03s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:45 cstone: civicrm revision changed from {{Gerrit|45dbfdb96f}} to {{Gerrit|90db6cb5a1}}
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 21:45 tzatziki: removed one file for legal compliance
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 21:12 XioNoX: delete AS13335 91.198.174.0/24 RPKI/ROA
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 21:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 21:10 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 21:10 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 21:08 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 21:07 twentyafterfour@deploy1001: Finished scap: testwikis to 1.34.0-wmf.23 refs [[phab:T220748|T220748]] (duration: 24m 55s)
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 21:01 XioNoX: enable interface damping on primary eqiad-esams link (eqiad side) - [[phab:T196432|T196432]]
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:47 dzahn@cumin1001: Updating IPMI password on 660 hosts - dzahn@cumin1001
* 18:50 jynus: restart db2100:s7 to apply new config
* 20:46 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 20:42 twentyafterfour@deploy1001: Started scap: testwikis to 1.34.0-wmf.23 refs [[phab:T220748|T220748]]
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:39 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:31 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/phpCharToUpper.json: {{Gerrit|8372dcdcdfe02261}} (duration: 00m 56s)
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 20:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/Title.js: {{Gerrit|8372dcdcdfe02261}} (duration: 02m 08s)
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 20:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 20:19 dzahn@cumin1001: Updating IPMI password on 21 hosts - dzahn@cumin1001
* 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:45 cstone: payments-wiki upgraded from {{Gerrit|de4b2bb9}} to {{Gerrit|0456850e}}
* 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 20:15 tzatziki: changing email for User:Olag
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:12 dzahn@cumin1001: Updating IPMI password on 18 hosts - dzahn@cumin1001
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:04 dzahn@cumin1001: Updating IPMI password on 29 hosts - dzahn@cumin1001
* 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 20:04 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 19:32 ejegg: updated payments-wiki from {{Gerrit|fc82318180}} to {{Gerrit|adef0e858f}}
* 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:26 dzahn@cumin1001: Updating IPMI password on 543 hosts - dzahn@cumin1001
* 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 19:25 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 19:22 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
* 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 19:22 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 19:20 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
* 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 19:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 19:14 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
* 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 19:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 19:08 twentyafterfour: Branch cut is in progress for 1.34.0-wmf.23
* 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 19:05 urandom: decommissioning Cassandra, restbase2011-c -- [[phab:T224553|T224553]]
* 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 18:06 papaul: upgrading firmware on scs1-a1-codfw
* 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 17:18 ejegg: updated SmashPig payments listener from {{Gerrit|a0151434f4}} to {{Gerrit|dc0c6b208b}}
* 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 17:09 urandom: decommissioning Cassandra, restbase2011-b -- [[phab:T224553|T224553]]
* 16:42 dancy@deploy1002: Sync cancelled.
* 17:08 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 17:00 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:41 dancy@deploy1002: Started scap: testing, disregard
* 16:59 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:09 awight@deploy1002: backport aborted: (duration: 00m 33s)
* 16:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (take 2) (duration: 03m 42s)
* 16:04 jbond42: run octocatalog-diff from elnath with current facts
* 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (duration: 03m 53s)
* 15:55 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Revert Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 55s)
* 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
* 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
* 15:39 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
* 15:38 urandom: decommissioning Cassandra, restbase2011-a -- [[phab:T224553|T224553]]
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Host down for on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9120 and previous config saved to /var/cache/conftool/dbconfig/20190917-151714-marostegui.json
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:16 marostegui: Stop MySQL on db2127 and shut the host down for onsite maintenance
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:52 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:52 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on wikitech for [[phab:T232464|T232464]]
* 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: {{Gerrit|1ac09d4709c645558f644a885fadc49c05cc04b9}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 39s)
* 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 8 wikis for [[phab:T232464|T232464]]
* 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: {{Gerrit|1a27e05a7ca53a063d5f9e284d6a09546ac8691c}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 52s)
* 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 7 wikis for [[phab:T232464|T232464]]
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 6 wikis for [[phab:T232464|T232464]]
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 5 wikis for [[phab:T232464|T232464]]
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 4 wikis for [[phab:T232464|T232464]]
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on remaining section 3 wikis for [[phab:T232464|T232464]]
* 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
* 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 2 wikis for [[phab:T232464|T232464]]
* 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
* 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 1 wikis for [[phab:T232464|T232464]]
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:48 anomie@mwmaint1002: Running cleanupRevActorPage.php on test wikis and mediawikiwiki for [[phab:T232464|T232464]]
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:39 anomie@deploy1001: Synchronized php-1.34.0-wmf.22/includes/MergeHistory.php: Backport MergeHistory fix for [[phab:T232464|T232464]] [[gerrit:537436]] (duration: 00m 54s)
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:35 ottomata: bouncing eventstreams service on scb hosts
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:15 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0b55db6f80df5f4c89f969332a6b31077a7172c4}}: Enable Tech Wishes survey on dewiki ([[phab:T316676|T316676]]) (duration: 04m 12s)
* 14:14 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:13 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:03 herron: migrating kafka1003 to kafka-main1003 [[phab:T225005|T225005]]
* 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
* 14:00 jbond42: forcing puppet run
* 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
* 14:00 bblack: lvs1015 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
* 08:35 hashar: Restarted CI Jenkins for plugin update
* 13:59 bblack: lvs2003 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
* 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 13:57 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 13:52 bblack: lvs1016 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
* 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832993{{!}}testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289)]] (duration: 03m 46s)
* 13:52 bblack: lvs2006 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:45 moritzm: repooling restbase2010 after reimage/completed bootstrap
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 db1104 db1085 db1086 after PDU maintenance - [[phab:T227539|T227539]]', diff saved to https://phabricator.wikimedia.org/P9117 and previous config saved to /var/cache/conftool/dbconfig/20190917-132102-marostegui.json
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:17 godog: force-run puppet in eqiad to update exported resources
* 07:10 kart_: Updated cxserver to 2022-09-15-113346-production ([[phab:T317289|T317289]], [[phab:T315209|T315209]])
* 13:14 jbond42: currently running octocatalog-diff for all hosts from elnath
* 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 13:02 marostegui: Start replication on db1130 db1104 db1085 db1086 after PDU maintenance is completed - [[phab:T227539|T227539]]
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:01 cmjohnson1: The PDU swap in  rack B3 eqiad is finished.
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:30 mobrovac: bootstrap restbase2010-c - [[phab:T224553|T224553]]
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:32 Urbanecm: EU SWAT is done
* 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:31 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
* 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 11:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: {{Gerrit|290e207}}: Add channels for the Translate and TranslationsNotification extension ([[phab:T221119|T221119]], [[phab:T144780|T144780]], [[phab:T143073|T143073]]) (duration: 00m 56s)
* 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 11:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:29 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
* 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:27 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: [[gerrit:537345{{!}}Use https rather than protcol-relative remote API URLs (T228851)]] (duration: 00m 58s)
* 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:24 cmjohnson1: commencing pdu swap rack b3 eqiad [[phab:T227539|T227539]]
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:22 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: [[gerrit:536732{{!}}Update ORES filter threshold configuration for new huwiki model (T230031)]] (duration: 00m 55s)
* 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:17 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: [[gerrit:537092{{!}}Enable EditorJourney for euwiki (T232061)]] (duration: 00m 56s)
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:13 Urbanecm: Run mwscript emptyUserGroup.php --wiki=aawiki 'inactive' ([[phab:T150538|T150538]])
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
* 10:58 mobrovac: bootstrap restbase2010-b - [[phab:T224553|T224553]]
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 36m 08s)
* 10:44 vgutierrez: replacing nginx with ATS in cp1076 (upload cluster) - [[phab:T231433|T231433]]
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - [[phab:T227539|T227539]]', diff saved to https://phabricator.wikimedia.org/P9116 and previous config saved to /var/cache/conftool/dbconfig/20190917-094827-marostegui.json
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:46 marostegui: Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - [[phab:T227539|T227539]]
* 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:30 hashar: Restarting CI jenkins
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:29 marostegui: Downtime db1073 db1130 db1104 db1085 db1086 for the PDU maintenance [[phab:T227539|T227539]]
* 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:18 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 09:16 mobrovac: bootstrap restbase2010-a - [[phab:T224553|T224553]]
* 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 09:15 jynus@cumin2001: START - Cookbook sre.hosts.downtime
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:05 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 100% of users who accept cookies - [[phab:T219150|T219150]] (duration: 00m 57s)
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:37 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp3034 - [[phab:T231849|T231849]] [[phab:T232724|T232724]]
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1074 with just 50 to keep its warmness level just in case [[phab:T231638|T231638]]', diff saved to https://phabricator.wikimedia.org/P9115 and previous config saved to /var/cache/conftool/dbconfig/20190917-075807-marostegui.json
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:48 effie: Enable puppet on mw*
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:42 elukey: reboot analytics-tool1004 (host running superset) for kernel updates
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:41 marostegui: Stop mysql on db1063 for decommissioning [[phab:T232564|T232564]]
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:40 marostegui: Remove db1063 from puppet and zarcillo [[phab:T232564|T232564]]
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:29 vgutierrez: repooling cp5007 without wikibase configuration - [[phab:T99531|T99531]]
* 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:19 vgutierrez: depooling cp5007 to ensure that wikibase removal goes as expected - [[phab:T99531|T99531]]
* 07:10 vgutierrez: getting rid of wikibase TLS certificate & nginx configuration on the text cache cluster - [[phab:T99531|T99531]]
* 06:56 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp2002, cp4021 and cp5001 - [[phab:T231849|T231849]]
* 06:55 vgutierrez: uploaded trafficserver 8.0.5-1wm8 to apt.wikimedia.org (stretch) - [[phab:T231849|T231849]]
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1066 [[phab:T233071|T233071]]', diff saved to https://phabricator.wikimedia.org/P9114 and previous config saved to /var/cache/conftool/dbconfig/20190917-065342-marostegui.json
* 06:49 moritzm: reimage restbase2010 to Stretch [[phab:T224553|T224553]]
* 05:57 vgutierrez: upgrading ATS to 8.0.5-1wm7 on cp2002 and cp4021 - [[phab:T232724|T232724]]
* 05:56 vgutierrez: uploaded trafficserver 8.0.5-1wm7 to apt.wikimedia.org (stretch) - [[phab:T232298|T232298]] [[phab:T232724|T232724]]
* 05:23 effie: disable puppet on mw* servers for 536979
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 master and remove read-only from s2 [[phab:T230785|T230785]]', diff saved to https://phabricator.wikimedia.org/P9113 and previous config saved to /var/cache/conftool/dbconfig/20190917-050133-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 as read-only for maintenance [[phab:T230785|T230785]]', diff saved to https://phabricator.wikimedia.org/P9112 and previous config saved to /var/cache/conftool/dbconfig/20190917-050043-marostegui.json
* 05:00 marostegui: Starting s2 failover from db1066 to db1122 -  [[phab:T230785|T230785]]
* 04:57 effie: Downtiming HTTPS-blog on icing - [[phab:T232412|T232412]]
* 04:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 and depool it from API [[phab:T230785|T230785]]', diff saved to https://phabricator.wikimedia.org/P9111 and previous config saved to /var/cache/conftool/dbconfig/20190917-041441-marostegui.json
* 04:11 marostegui: Start s2 pre-switchover steps [[phab:T230785|T230785]]
* 00:34 AndyRussG: updated fruec from {{Gerrit|fb29cb7407}} to {{Gerrit|97128874bf}}


== 2019-09-16 ==
== 2022-09-19 ==
* 23:53 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgDebugLogFile in VS (duration: 00m 55s)
* 22:59 ebernhardson: [[phab:T317200|T317200]] start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic
* 23:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgDebugLogFile in CS (duration: 00m 55s)
* 21:21 maryum: Deployed security patch for [[phab:T302479|T302479]]
* 23:42 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgUploadThumbnailRenderHttpCustom* in VS (duration: 00m 54s)
* 21:21 mstyles@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s)
* 23:41 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgUploadThumbnailRenderHttpCustom* in CS (duration: 00m 55s)
* 21:15 sbassett: Deployed security patch for [[phab:T312820|T312820]]
* 23:30 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wmgRC2UDPAddress in VS (duration: 00m 55s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgRC2UDPAddress in CS (duration: 00m 56s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:24 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgCopyUploadProxy in VS (duration: 00m 56s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgCopyUploadProxy in CS (duration: 00m 55s)
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T225261|T225261]] [[phab:T194019|T194019]] Adjust CentralNotice CSP for banner previews for FR-tech (duration: 00m 55s)
* 20:59 cjming: end of UTC late backport window
* 22:59 chaomodus: restarted nagios-nrpe-server on notebook1003
* 20:59 ebernhardson@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: [[gerrit:833031{{!}}Add token_count subfield to outgoing_link (T317546)]] (duration: 03m 51s)
* 22:46 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use __DIR__ rather than global wmfConfgDir (duration: 00m 55s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:48 ebernhardson: unban elastic1027 from production-search-eqiad
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:55 XioNoX: remove 2 sessions to AS12871 on cr2-esams - [[phab:T232617|T232617]]
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:20 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 20:21 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820459{{!}}Wikifunctions: Drop two config items moved to docker]] (duration: 03m 38s)
* 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
* 20:21 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 20:16 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:829877{{!}}ExtensionDistributor: Add REL1_39 (T313925)]] (duration: 03m 38s)
* 20:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 20:12 cjming@deploy1002: Finished scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] (duration: 06m 31s)
* 20:15 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 20:06 cjming@deploy1002: cjming and arlolra: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 20:06 cjming@deploy1002: Started scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]]
* 20:08 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
* 19:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 20:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 19:33 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:55 XioNoX: reboot scs-a8-eqiad (at 100% CPU)
* 19:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:55 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
* 19:30 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 17:43 dancy@deploy1002: Installation of scap version "4.21.0" completed for 561 hosts
* 19:53 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
* 17:42 dancy@deploy1002: Installing scap version "4.21.0" for 561 hosts
* 19:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 17:36 dancy@deploy1002: Sync cancelled.
* 19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 17:36 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 19:51 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
* 17:36 dancy@deploy1002: Started scap: testing, disregard
* 19:51 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 14:03 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ukwikivoyage<nowiki>{</nowiki>.png,-1.5x.png,-2x.png<nowiki>}</nowiki> ([[phab:T317718|T317718]])
* 19:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 14:02 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|6c7151d969b6997bd9cce042b7bc78c282dd9b26}}: Regenerate ukwikivoyage logo ([[phab:T317718|T317718]]) (duration: 03m 46s)
* 19:35 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:28 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:27 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 13:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cbf161d148228e0e706813f923ab1a5d4b42757a}}: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro ([[phab:T314518|T314518]]) (duration: 04m 01s)
* 19:27 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:19 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
* 07:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:13 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
* 07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:13 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 07:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a6c1ddf5cd1a46ab05f5d6fda4b938a3ee37238}}: Remove unnecessary wgNamespaceAliases from bnwiki ([[phab:T318003|T318003]]) (duration: 04m 16s)
* 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:09 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:03 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgCookieSetOnAutoBlock and wgCookieSetOnIpBlock to the default; never varied (duration: 00m 56s)
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up globals in InitialiseSettings.php (duration: 00m 56s)
* 19:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 19:01 dzahn@cumin1001: Updating IPMI password on 0 hosts - dzahn@cumin1001
* 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 18:54 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
* 18:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 18:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T223602|T223602]] Variant configuration: Read JSON config for all wikis (duration: 00m 56s)
* 18:48 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 56s)
* 18:40 jforrester@deploy1001: Synchronized src/WmfClusters.php: Use static VariantSettings instead of InitialiseSettings (noc-only change) (duration: 00m 55s)
* 18:40 mutante: phab1001 - racadm racreset
* 18:21 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Remove globals declaration and use via GLOBALS for testability (duration: 00m 56s)
* 18:15 Lucas_WMDE: Morning SWAT done
* 18:14 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:536982{{!}}bridge: enable EditTags for beta (T232582)]] (duration: 00m 58s)
* 18:12 herron: migrating kafka1002 to kafka-main1002 [[phab:T225005|T225005]]
* 18:09 mutante: registry2001 - restarting nginx
* 17:55 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 55s)
* 17:49 ejegg: updated SmashPig standalone from {{Gerrit|5d187092a7}} to {{Gerrit|a0151434f4}}
* 17:42 urandom: decommissioning Cassandra, restbase2010-c -- [[phab:T224553|T224553]]
* 17:42 ebernhardson: restart elasticsearch_6@production-search-eqiad on elastic1027 due to >1k orphan tasks
* 17:09 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 54s)
* 16:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make CommonSettings use mtime from VariantSettings (duration: 00m 55s)
* 16:58 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make InitialiseSettings use values from VariantSettings (duration: 00m 54s)
* 16:55 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Establish VariantSettings.php everywhere (duration: 00m 56s)
* 16:51 ebernhardson: ban elastic1027 from production-search-eqiad-chi
* 16:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T223602|T223602]] Inject config object into InitialiseSettings-labs rather than use wgConf global (duration: 00m 55s)
* 15:42 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Write JSON config for all wikis [[phab:T223602|T223602]] (duration: 00m 56s)