You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org)
imported>Stashbot
(ebernhardson: start restore of commonswiki_file from thanos-swift to cloudelastic)
Line 1: Line 1:
== 2022-07-05 ==
* 23:30 ebernhardson: start restore of commonswiki_file from thanos-swift to cloudelastic
* 23:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - [[phab:T309648|T309648]]
* 22:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - [[phab:T309648|T309648]]
* 22:28 ryankemper: [[phab:T309648|T309648]] Manually restarting `cloudelastic1006` before proceeding to a normal rolling restart of cloudelastic
* 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:55 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:811339{{!}}Enable title above tabs everywhere (T311773)]] (duration: 03m 23s)
* 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:35 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: [[gerrit:811350{{!}}cirrus: Disable commonswiki writes to cloudelastic (T309648)]] (duration: 03m 42s)
* 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:27 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: [[gerrit:811279{{!}}job queue: Squelch errors related to unwritable cloudelastic (T309648)]] (duration: 03m 37s)
* 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:19 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: [[gerrit:811280{{!}}job queue: Squelch errors related to unwritable cloudelastic (T309648)]] (duration: 03m 43s)
* 20:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2174.codfw.wmnet with OS bullseye
* 20:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2174.codfw.wmnet with reason: host reimage
* 20:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2174.codfw.wmnet with reason: host reimage
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2173.codfw.wmnet with OS bullseye
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:24 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:811350{{!}}cirrus: Disable commonswiki writes to cloudelastic (T309648)]] (duration: 03m 23s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2174.codfw.wmnet with OS bullseye
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|66c973087b7736b22ce7edb5b830e50e31710e4a}}: QuickSurveys: Increase coverage of research-incentive survey ([[phab:T311015|T311015]]) (duration: 03m 28s)
* 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2173.codfw.wmnet with reason: host reimage
* 20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2171.codfw.wmnet with OS bullseye
* 20:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2173.codfw.wmnet with reason: host reimage
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b1c217103753d886ab5b18b88f112ec26931bff2}}: GrowthExperiments: End mailing list campaign on eswiki ([[phab:T307985|T307985]]) (duration: 03m 39s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
* 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
* 19:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye
* 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2173.codfw.wmnet with OS bullseye
* 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2172.codfw.wmnet with OS bullseye
* 19:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
* 19:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
* 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2172.codfw.wmnet with OS bullseye
* 18:53 papaul: power down moss-be2002 for NVMe installation
* 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab2001.wikimedia.org
* 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db2171.codfw.wmnet with OS bullseye
* 18:44 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:40 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.wikimedia.org
* 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab2001.codfw.wmnet
* 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2170.codfw.wmnet with OS bullseye
* 18:36 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:32 papaul: power down moss-be2001 for NVMe installation
* 18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
* 18:32 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.codfw.wmnet
* 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
* 18:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
* 18:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
* 18:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye
* 18:01 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2174
* 18:01 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2174
* 18:00 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2173
* 18:00 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2173
* 17:59 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2172
* 17:59 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2172
* 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2171
* 17:57 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2171
* 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2170
* 17:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2170
* 17:54 mutante: disabling puppet on gitlab* - debugging gerrit:811276
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye
* 17:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2170.codfw.wmnet with OS bullseye
* 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2174.mgmt.codfw.wmnet with reboot policy FORCED
* 17:33 moritzm: installing haproxy security updates on stretch
* 17:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2174.mgmt.codfw.wmnet with reboot policy FORCED
* 16:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2173.mgmt.codfw.wmnet with reboot policy FORCED
* 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2172.mgmt.codfw.wmnet with reboot policy FORCED
* 16:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye
* 16:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2169.codfw.wmnet with OS bullseye
* 16:44 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
* 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
* 16:34 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
* 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2173.mgmt.codfw.wmnet with reboot policy FORCED
* 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2172.mgmt.codfw.wmnet with reboot policy FORCED
* 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2171.mgmt.codfw.wmnet with reboot policy FORCED
* 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2170.mgmt.codfw.wmnet with reboot policy FORCED
* 16:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2169.codfw.wmnet with OS bullseye
* 16:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2164.codfw.wmnet with OS bullseye
* 15:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2171.mgmt.codfw.wmnet with reboot policy FORCED
* 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
* 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2170.mgmt.codfw.wmnet with reboot policy FORCED
* 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
* 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2169.mgmt.codfw.wmnet with reboot policy FORCED
* 15:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye
* 15:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2164.codfw.wmnet with OS bullseye
* 15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2169.mgmt.codfw.wmnet with reboot policy FORCED
* 15:09 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2169
* 15:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2169
* 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db2169
* 15:05 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db2169
* 15:05 moritzm: installing firejail updates on stretch
* 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye
* 15:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:00 moritzm: draining ganeti2024 for eventual reimage [[phab:T311686|T311686]]
* 14:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2164.mgmt.codfw.wmnet with reboot policy FORCED
* 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD
* 14:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD
* 14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:22 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
* 14:22 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 14:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2164.mgmt.codfw.wmnet with reboot policy FORCED
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:34 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:33 urbanecm: UTC afternoon B&C window done
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 urbanecm@deploy1002: Synchronized w/static.php: {{Gerrit|300ef4a5ee6f0c35de831e88eb2f8169e7f66e97}}: static.php: Update call to deprecated IContextSource::getStats (duration: 03m 41s)
* 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:15 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|1287b969fc42aee6efae5ff1f1943394ba35e326}}: Drop deprecated feature flags ([[phab:T310684|T310684]]) (duration: 03m 32s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|891057f6ba555b2ece0424e3364d853eb20555da}}: Drop dependent feature flags ([[phab:T310684|T310684]]) (duration: 03m 37s)
* 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
* 12:42 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
* 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30861 and previous config saved to /var/cache/conftool/dbconfig/20220705-124101-ladsgroup.json
* 12:37 btullis@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
* 12:36 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
* 12:31 moritzm: draining ganeti2023 for eventual reimage [[phab:T311686|T311686]]
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): '[[phab:T311106|T311106]]', diff saved to https://phabricator.wikimedia.org/P30859 and previous config saved to /var/cache/conftool/dbconfig/20220705-122941-ladsgroup.json
* 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2158 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P30848 and previous config saved to /var/cache/conftool/dbconfig/20220705-110432-marostegui.json
* 11:01 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:30 _joe_: running benchmarks in codfw for php7.2/7.4 comparison.
* 10:29 moritzm: sudo gnt-cluster upgrade --to 3.0 for ganeti/codfw [[phab:T311686|T311686]]
* 10:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001
* 10:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001
* 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:00 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.19  refs [[phab:T308072|T308072]]
* 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:36 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.19  refs [[phab:T308072|T308072]] (duration: 34m 21s)
* 09:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1002
* 09:33 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1002
* 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:02 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.19  refs [[phab:T308072|T308072]]
* 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0)
* 08:52 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces
* 08:43 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 08:30 moritzm: uploaded 7.4.30-3+0~20220627.69+debian10~1.gbpf2b381+wmf1+buster3 to component/php74 (pulling php-common with the socket helper) [[phab:T311386|T311386]]
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30835 and previous config saved to /var/cache/conftool/dbconfig/20220705-082415-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30834 and previous config saved to /var/cache/conftool/dbconfig/20220705-082058-root.json
* 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30833 and previous config saved to /var/cache/conftool/dbconfig/20220705-080911-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30832 and previous config saved to /var/cache/conftool/dbconfig/20220705-080554-root.json
* 07:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|89aef540e22aaded6c279d9d11c769507e497b6a}}: MentorDashboard: enable the Vue version of the dashboard in beta ([[phab:T300532|T300532]]) (duration: 03m 18s)
* 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30831 and previous config saved to /var/cache/conftool/dbconfig/20220705-075408-root.json
* 07:54 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|c8c092a4133d119bf9aaece6f934ca7744ea6951}}: trwiki: Change old and new vector logos for 500k articles ([[phab:T311946|T311946]]; 3/3) (duration: 03m 34s)
* 07:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30830 and previous config saved to /var/cache/conftool/dbconfig/20220705-075050-root.json
* 07:50 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|c8c092a4133d119bf9aaece6f934ca7744ea6951}}: trwiki: Change old and new vector logos for 500k articles ([[phab:T311946|T311946]]; 2/3) (duration: 03m 36s)
* 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:46 urbanecm@deploy1002: Synchronized static/: {{Gerrit|c8c092a4133d119bf9aaece6f934ca7744ea6951}}: trwiki: Change old and new vector logos for 500k articles ([[phab:T311946|T311946]]; 1/3) (duration: 03m 17s)
* 07:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30829 and previous config saved to /var/cache/conftool/dbconfig/20220705-073904-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30828 and previous config saved to /var/cache/conftool/dbconfig/20220705-073546-root.json
* 07:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: {{Gerrit|ce64780fbd78a414c6ab08fc374186ae4dd58bac}}: SuggestedEdits: Adjust thumbnailSource logic ([[phab:T311789|T311789]]) (duration: 03m 32s)
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30827 and previous config saved to /var/cache/conftool/dbconfig/20220705-072400-root.json
* 07:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:21 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: {{Gerrit|d5050b773992aa6100aa14cd328836ff336ef8c1}}: Retrieve pages-with-suggestion via Elastic scroll directly ([[phab:T311476|T311476]]) (duration: 03m 32s)
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30826 and previous config saved to /var/cache/conftool/dbconfig/20220705-072043-root.json
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:17 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CentralNotice/includes/specials/CentralNotice.php: {{Gerrit|414b7b8a14b451f9bd0fb0c36d44fe6a9310102e}}: Only add tabs to special pages ([[phab:T311944|T311944]]) (duration: 03m 30s)
* 07:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|14df0e25aabf21715b281a9dbb5893ae2ae7db9a}}: zh(wikiversity{{!}}wiktionary): Disable local upload ([[phab:T312012|T312012]]) (duration: 03m 47s)
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30824 and previous config saved to /var/cache/conftool/dbconfig/20220705-070856-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30823 and previous config saved to /var/cache/conftool/dbconfig/20220705-070539-root.json
* 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch
* 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Decommission db2073 [[phab:T311837|T311837]]', diff saved to https://phabricator.wikimedia.org/P30822 and previous config saved to /var/cache/conftool/dbconfig/20220705-070019-marostegui.json
* 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2073.codfw.wmnet
* 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30821 and previous config saved to /var/cache/conftool/dbconfig/20220705-065352-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30820 and previous config saved to /var/cache/conftool/dbconfig/20220705-065035-root.json
* 06:50 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2073.codfw.wmnet
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30819 and previous config saved to /var/cache/conftool/dbconfig/20220705-063848-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30818 and previous config saved to /var/cache/conftool/dbconfig/20220705-063531-root.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30817 and previous config saved to /var/cache/conftool/dbconfig/20220705-063402-root.json
* 06:09 marostegui: dbmaint s6@eqiad [[phab:T298557|T298557]]
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 [[phab:T311522|T311522]]', diff saved to https://phabricator.wikimedia.org/P30816 and previous config saved to /var/cache/conftool/dbconfig/20220705-060526-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - [[phab:T311522|T311522]]', diff saved to https://phabricator.wikimedia.org/P30814 and previous config saved to /var/cache/conftool/dbconfig/20220705-060111-marostegui.json
* 06:00 marostegui: Starting s6 eqiad failover from db1131 to db1173 - [[phab:T311522|T311522]]
* 05:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 05:58 TimStarling: deploying multi-DC support g 801621, manual puppet run on cp1080
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 [[phab:T311522|T311522]]', diff saved to https://phabricator.wikimedia.org/P30813 and previous config saved to /var/cache/conftool/dbconfig/20220705-052219-marostegui.json
* 05:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T311522|T311522]]
* 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T311522|T311522]]
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
== 2022-07-04 ==
== 2022-07-04 ==
* 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org
* 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org

Revision as of 23:30, 5 July 2022

2022-07-05

  • 23:30 ebernhardson: start restore of commonswiki_file from thanos-swift to cloudelastic
  • 23:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - T309648
  • 22:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - T309648
  • 22:28 ryankemper: T309648 Manually restarting `cloudelastic1006` before proceeding to a normal rolling restart of cloudelastic
  • 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:55 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable title above tabs everywhere (T311773) (duration: 03m 23s)
  • 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:35 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: cirrus: Disable commonswiki writes to cloudelastic (T309648) (duration: 03m 42s)
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:27 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: job queue: Squelch errors related to unwritable cloudelastic (T309648) (duration: 03m 37s)
  • 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:19 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: job queue: Squelch errors related to unwritable cloudelastic (T309648) (duration: 03m 43s)
  • 20:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2174.codfw.wmnet with OS bullseye
  • 20:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2174.codfw.wmnet with reason: host reimage
  • 20:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2174.codfw.wmnet with reason: host reimage
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2173.codfw.wmnet with OS bullseye
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: cirrus: Disable commonswiki writes to cloudelastic (T309648) (duration: 03m 23s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2174.codfw.wmnet with OS bullseye
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 66c9730: QuickSurveys: Increase coverage of research-incentive survey (T311015) (duration: 03m 28s)
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2173.codfw.wmnet with reason: host reimage
  • 20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2171.codfw.wmnet with OS bullseye
  • 20:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2173.codfw.wmnet with reason: host reimage
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b1c2171: GrowthExperiments: End mailing list campaign on eswiki (T307985) (duration: 03m 39s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 19:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2173.codfw.wmnet with OS bullseye
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2172.codfw.wmnet with OS bullseye
  • 19:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
  • 19:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2172.codfw.wmnet with OS bullseye
  • 18:53 papaul: power down moss-be2002 for NVMe installation
  • 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab2001.wikimedia.org
  • 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db2171.codfw.wmnet with OS bullseye
  • 18:44 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:40 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.wikimedia.org
  • 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab2001.codfw.wmnet
  • 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2170.codfw.wmnet with OS bullseye
  • 18:36 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:32 papaul: power down moss-be2001 for NVMe installation
  • 18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 18:32 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.codfw.wmnet
  • 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 18:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
  • 18:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
  • 18:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye
  • 18:01 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2174
  • 18:01 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2174
  • 18:00 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2173
  • 18:00 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2173
  • 17:59 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2172
  • 17:59 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2172
  • 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2171
  • 17:57 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2171
  • 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2170
  • 17:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2170
  • 17:54 mutante: disabling puppet on gitlab* - debugging gerrit:811276
  • 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye
  • 17:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2170.codfw.wmnet with OS bullseye
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2174.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:33 moritzm: installing haproxy security updates on stretch
  • 17:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2174.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2173.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2172.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye
  • 16:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2169.codfw.wmnet with OS bullseye
  • 16:44 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
  • 16:34 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2173.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2172.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2171.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2170.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2169.codfw.wmnet with OS bullseye
  • 16:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2164.codfw.wmnet with OS bullseye
  • 15:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2171.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2170.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
  • 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2169.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye
  • 15:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2164.codfw.wmnet with OS bullseye
  • 15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2169.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:09 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2169
  • 15:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2169
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db2169
  • 15:05 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db2169
  • 15:05 moritzm: installing firejail updates on stretch
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye
  • 15:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:00 moritzm: draining ganeti2024 for eventual reimage T311686
  • 14:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2164.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD
  • 14:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD
  • 14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:22 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 14:22 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 14:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2164.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:33 urbanecm: UTC afternoon B&C window done
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:26 urbanecm@deploy1002: Synchronized w/static.php: 300ef4a: static.php: Update call to deprecated IContextSource::getStats (duration: 03m 41s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:15 urbanecm@deploy1002: Synchronized wmf-config/: 1287b96: Drop deprecated feature flags (T310684) (duration: 03m 32s)
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 891057f: Drop dependent feature flags (T310684) (duration: 03m 37s)
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 12:42 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30861 and previous config saved to /var/cache/conftool/dbconfig/20220705-124101-ladsgroup.json
  • 12:37 btullis@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 12:36 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 12:31 moritzm: draining ganeti2023 for eventual reimage T311686
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'T311106', diff saved to https://phabricator.wikimedia.org/P30859 and previous config saved to /var/cache/conftool/dbconfig/20220705-122941-ladsgroup.json
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2158 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P30848 and previous config saved to /var/cache/conftool/dbconfig/20220705-110432-marostegui.json
  • 11:01 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:30 _joe_: running benchmarks in codfw for php7.2/7.4 comparison.
  • 10:29 moritzm: sudo gnt-cluster upgrade --to 3.0 for ganeti/codfw T311686
  • 10:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001
  • 10:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:00 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.19 refs T308072
  • 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:36 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.19 refs T308072 (duration: 34m 21s)
  • 09:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1002
  • 09:33 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1002
  • 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:02 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.19 refs T308072
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0)
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces
  • 08:43 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:30 moritzm: uploaded 7.4.30-3+0~20220627.69+debian10~1.gbpf2b381+wmf1+buster3 to component/php74 (pulling php-common with the socket helper) T311386
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30835 and previous config saved to /var/cache/conftool/dbconfig/20220705-082415-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30834 and previous config saved to /var/cache/conftool/dbconfig/20220705-082058-root.json
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30833 and previous config saved to /var/cache/conftool/dbconfig/20220705-080911-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30832 and previous config saved to /var/cache/conftool/dbconfig/20220705-080554-root.json
  • 07:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 89aef54: MentorDashboard: enable the Vue version of the dashboard in beta (T300532) (duration: 03m 18s)
  • 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30831 and previous config saved to /var/cache/conftool/dbconfig/20220705-075408-root.json
  • 07:54 urbanecm@deploy1002: Synchronized logos/config.yaml: c8c092a: trwiki: Change old and new vector logos for 500k articles (T311946; 3/3) (duration: 03m 34s)
  • 07:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30830 and previous config saved to /var/cache/conftool/dbconfig/20220705-075050-root.json
  • 07:50 urbanecm@deploy1002: Synchronized wmf-config/: c8c092a: trwiki: Change old and new vector logos for 500k articles (T311946; 2/3) (duration: 03m 36s)
  • 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:46 urbanecm@deploy1002: Synchronized static/: c8c092a: trwiki: Change old and new vector logos for 500k articles (T311946; 1/3) (duration: 03m 17s)
  • 07:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30829 and previous config saved to /var/cache/conftool/dbconfig/20220705-073904-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30828 and previous config saved to /var/cache/conftool/dbconfig/20220705-073546-root.json
  • 07:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: ce64780: SuggestedEdits: Adjust thumbnailSource logic (T311789) (duration: 03m 32s)
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30827 and previous config saved to /var/cache/conftool/dbconfig/20220705-072400-root.json
  • 07:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:21 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: d5050b7: Retrieve pages-with-suggestion via Elastic scroll directly (T311476) (duration: 03m 32s)
  • 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30826 and previous config saved to /var/cache/conftool/dbconfig/20220705-072043-root.json
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:17 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CentralNotice/includes/specials/CentralNotice.php: 414b7b8: Only add tabs to special pages (T311944) (duration: 03m 30s)
  • 07:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 14df0e2: zh(wikiversity|wiktionary): Disable local upload (T312012) (duration: 03m 47s)
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30824 and previous config saved to /var/cache/conftool/dbconfig/20220705-070856-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30823 and previous config saved to /var/cache/conftool/dbconfig/20220705-070539-root.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Decommission db2073 T311837', diff saved to https://phabricator.wikimedia.org/P30822 and previous config saved to /var/cache/conftool/dbconfig/20220705-070019-marostegui.json
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2073.codfw.wmnet
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30821 and previous config saved to /var/cache/conftool/dbconfig/20220705-065352-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30820 and previous config saved to /var/cache/conftool/dbconfig/20220705-065035-root.json
  • 06:50 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2073.codfw.wmnet
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30819 and previous config saved to /var/cache/conftool/dbconfig/20220705-063848-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30818 and previous config saved to /var/cache/conftool/dbconfig/20220705-063531-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30817 and previous config saved to /var/cache/conftool/dbconfig/20220705-063402-root.json
  • 06:09 marostegui: dbmaint s6@eqiad T298557
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 T311522', diff saved to https://phabricator.wikimedia.org/P30816 and previous config saved to /var/cache/conftool/dbconfig/20220705-060526-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T311522', diff saved to https://phabricator.wikimedia.org/P30814 and previous config saved to /var/cache/conftool/dbconfig/20220705-060111-marostegui.json
  • 06:00 marostegui: Starting s6 eqiad failover from db1131 to db1173 - T311522
  • 05:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 05:58 TimStarling: deploying multi-DC support g 801621, manual puppet run on cp1080
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 T311522', diff saved to https://phabricator.wikimedia.org/P30813 and previous config saved to /var/cache/conftool/dbconfig/20220705-052219-marostegui.json
  • 05:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T311522
  • 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s6 T311522
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-07-04

  • 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org
  • 19:53 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 19:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2004-dev.wikimedia.org
  • 19:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.wikimedia.org
  • 19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 8 hosts with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 8 hosts with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30811 and previous config saved to /var/cache/conftool/dbconfig/20220704-192955-ladsgroup.json
  • 19:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2003-dev.wikimedia.org
  • 19:27 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2004-dev.wikimedia.org
  • 19:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
  • 19:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 19:17 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2003-dev.wikimedia.org
  • 19:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 19:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2001-dev.wikimedia.org
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P30810 and previous config saved to /var/cache/conftool/dbconfig/20220704-191450-ladsgroup.json
  • 19:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1003.wikimedia.org
  • 19:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org
  • 19:01 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2001-dev.wikimedia.org
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P30809 and previous config saved to /var/cache/conftool/dbconfig/20220704-185945-ladsgroup.json
  • 18:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1004.wikimedia.org
  • 18:53 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org
  • 18:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.wikimedia.org
  • 18:52 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
  • 18:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1003.wikimedia.org
  • 18:51 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1003.wikimedia.org
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30808 and previous config saved to /var/cache/conftool/dbconfig/20220704-184440-ladsgroup.json
  • 18:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.wikimedia.org
  • 18:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1003.wikimedia.org
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30807 and previous config saved to /var/cache/conftool/dbconfig/20220704-184231-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30806 and previous config saved to /var/cache/conftool/dbconfig/20220704-184211-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P30805 and previous config saved to /var/cache/conftool/dbconfig/20220704-182706-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P30804 and previous config saved to /var/cache/conftool/dbconfig/20220704-181200-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30803 and previous config saved to /var/cache/conftool/dbconfig/20220704-175655-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30802 and previous config saved to /var/cache/conftool/dbconfig/20220704-175446-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30801 and previous config saved to /var/cache/conftool/dbconfig/20220704-175425-ladsgroup.json
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P30800 and previous config saved to /var/cache/conftool/dbconfig/20220704-173920-ladsgroup.json
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P30799 and previous config saved to /var/cache/conftool/dbconfig/20220704-172415-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30798 and previous config saved to /var/cache/conftool/dbconfig/20220704-170910-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30797 and previous config saved to /var/cache/conftool/dbconfig/20220704-170800-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 17:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30796 and previous config saved to /var/cache/conftool/dbconfig/20220704-170740-ladsgroup.json
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P30795 and previous config saved to /var/cache/conftool/dbconfig/20220704-165235-ladsgroup.json
  • 16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P30793 and previous config saved to /var/cache/conftool/dbconfig/20220704-163730-ladsgroup.json
  • 16:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30792 and previous config saved to /var/cache/conftool/dbconfig/20220704-162225-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30791 and previous config saved to /var/cache/conftool/dbconfig/20220704-162015-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30790 and previous config saved to /var/cache/conftool/dbconfig/20220704-161944-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P30789 and previous config saved to /var/cache/conftool/dbconfig/20220704-161817-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P30788 and previous config saved to /var/cache/conftool/dbconfig/20220704-160439-ladsgroup.json
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P30787 and previous config saved to /var/cache/conftool/dbconfig/20220704-160314-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P30786 and previous config saved to /var/cache/conftool/dbconfig/20220704-154933-ladsgroup.json
  • 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P30785 and previous config saved to /var/cache/conftool/dbconfig/20220704-154810-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30784 and previous config saved to /var/cache/conftool/dbconfig/20220704-153428-ladsgroup.json
  • 15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P30783 and previous config saved to /var/cache/conftool/dbconfig/20220704-153306-ladsgroup.json
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30782 and previous config saved to /var/cache/conftool/dbconfig/20220704-153218-ladsgroup.json
  • 15:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T305300)', diff saved to https://phabricator.wikimedia.org/P30781 and previous config saved to /var/cache/conftool/dbconfig/20220704-152931-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 14:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 14:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Exempt WMCS ranges from globalblocking everywhere (T307648) (duration: 03m 26s)
  • 14:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 14:20 oblivian@deploy1002: Synchronized README: testing new php restart script (duration: 03m 23s)
  • 14:19 elukey: roll restart of thanos-fe's proxy to pick up a new account - T311628
  • 14:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 14:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 14:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 14:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 14:10 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set GlobalBlockingAllowedRanges for testwiki (T307648) (duration: 03m 39s)
  • 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:05 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 14:05 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 13:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 13:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 13:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 13:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 13:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 13:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 13:11 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 13:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 12:38 jynus: running alter table on dbbackups db T283017
  • 12:27 _joe_: updated etcdmirror to 0.0.8 everywhere
  • 12:17 moritzm: installing 4.9.320 on stretch hosts
  • 11:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:55 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GlobalBlocking/includes/GlobalBlocking.php: Backport: Add statsd metric collection on db calls (T307648) (duration: 03m 26s)
  • 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:50 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addimage/AddImageArticleTarget.js: Backport: AddImageArticleTarget: Update to new mediaClass/mediaTag format (T311916) (duration: 03m 33s)
  • 11:36 marostegui@cumin2002: dbctl commit (dc=all): 'Add db2156 to s3 T311493', diff saved to https://phabricator.wikimedia.org/P30774 and previous config saved to /var/cache/conftool/dbconfig/20220704-113640-marostegui.json
  • 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:54 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.18/includes: Backport: Revert "Revert "RecentChange: Straight join to actor table when needed"" (T311360) (duration: 03m 49s)
  • 10:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:25 _joe_: rollback etcdmirror to 0.0.6 on conf2005
  • 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:25 godog: silence etcd p a g e
  • 10:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:21 _joe_: restarting etcdmirror on conf2005
  • 10:21 moritzm: installing gnupg2 security updates
  • 10:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:17 _joe_: upgraded etcdmirror to 0.0.7 on conf2006, now going with the rest of codfw
  • 10:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:24 marostegui@cumin2002: dbctl commit (dc=all): 'Add db2157 to s5 T311493', diff saved to https://phabricator.wikimedia.org/P30758 and previous config saved to /var/cache/conftool/dbconfig/20220704-082406-marostegui.json
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MewOphaswongse out of all services on: 634 hosts
  • 08:07 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MewOphaswongse out of all services on: 634 hosts
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MewOphaswongse out of all services on: 1299 hosts
  • 08:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MewOphaswongse out of all services on: 1299 hosts
  • 08:04 elukey: kill leftover processes of user `mewoph` on stat100x to allow puppet runs
  • 07:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
  • 07:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
  • 06:49 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2092.codfw.wmnet
  • 06:47 marostegui@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:43 marostegui@cumin2002: START - Cookbook sre.dns.netbox
  • 06:39 marostegui@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2092.codfw.wmnet
  • 06:34 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2091.codfw.wmnet
  • 06:32 marostegui@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:28 marostegui@cumin2002: START - Cookbook sre.dns.netbox
  • 06:24 marostegui@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2091.codfw.wmnet
  • 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: codfw s4 sanitarium master switch
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: codfw s4 sanitarium master switch

2022-07-03

  • 11:36 _joe_: temporarily raised replicas for shellbox to 24
  • 11:35 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply

2022-07-02

  • 05:36 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 05:36 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 05:24 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 05:23 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 05:21 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 05:20 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 05:11 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 05:11 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 04:49 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 04:49 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 04:48 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 04:48 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 03:59 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 03:59 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 03:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 03:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 03:56 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 03:56 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 02:49 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 02:49 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 01:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance

2022-07-01

  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30753 and previous config saved to /var/cache/conftool/dbconfig/20220701-235524-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P30752 and previous config saved to /var/cache/conftool/dbconfig/20220701-234019-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P30751 and previous config saved to /var/cache/conftool/dbconfig/20220701-232514-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30750 and previous config saved to /var/cache/conftool/dbconfig/20220701-231009-ladsgroup.json
  • 23:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1012.eqiad.wmnet with reason: host reimage
  • 22:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1012.eqiad.wmnet with reason: host reimage
  • 22:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1015.eqiad.wmnet with OS bullseye
  • 22:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1015.eqiad.wmnet with reason: host reimage
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30749 and previous config saved to /var/cache/conftool/dbconfig/20220701-221438-ladsgroup.json
  • 22:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1015.eqiad.wmnet with reason: host reimage
  • 22:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30748 and previous config saved to /var/cache/conftool/dbconfig/20220701-221418-ladsgroup.json
  • 22:12 mutante: restbase2018 - attempting power cycle via mgmt - /admin1-> racadm serveraction powercycle (T311890)
  • 22:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1014.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1013.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1008.eqiad.wmnet with OS bullseye
  • 22:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1010.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P30747 and previous config saved to /var/cache/conftool/dbconfig/20220701-215913-ladsgroup.json
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1009.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1011.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 21:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1014.eqiad.wmnet with reason: host reimage
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1010.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1008.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1013.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1008.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1013.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1010.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1014.eqiad.wmnet with reason: host reimage
  • 21:48 mutante: https://doc.wikimedia.org switched to doc1002 backend on buster T247653
  • 21:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stat1009.eqiad.wmnet with OS bullseye
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P30746 and previous config saved to /var/cache/conftool/dbconfig/20220701-214408-ladsgroup.json
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1010.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1008.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1013.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1009.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 21:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1014.eqiad.wmnet with OS bullseye
  • 21:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1006.eqiad.wmnet with OS bullseye
  • 21:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stat1009.eqiad.wmnet with reason: host reimage
  • 21:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stat1009.eqiad.wmnet with reason: host reimage
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30745 and previous config saved to /var/cache/conftool/dbconfig/20220701-212903-ladsgroup.json
  • 21:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1006.eqiad.wmnet with reason: host reimage
  • 21:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host stat1009.eqiad.wmnet with OS bullseye
  • 21:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1006.eqiad.wmnet with reason: host reimage
  • 21:09 mutante: https://doc.wikimedia.org - scheduled maintenance period - switching to buster backend doc1002 (T247653)
  • 21:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
  • 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30744 and previous config saved to /var/cache/conftool/dbconfig/20220701-203251-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30743 and previous config saved to /var/cache/conftool/dbconfig/20220701-203231-ladsgroup.json
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P30742 and previous config saved to /var/cache/conftool/dbconfig/20220701-201726-ladsgroup.json
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P30741 and previous config saved to /var/cache/conftool/dbconfig/20220701-200221-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30740 and previous config saved to /var/cache/conftool/dbconfig/20220701-194716-ladsgroup.json
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30739 and previous config saved to /var/cache/conftool/dbconfig/20220701-183504-ladsgroup.json
  • 18:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30738 and previous config saved to /var/cache/conftool/dbconfig/20220701-183444-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P30737 and previous config saved to /var/cache/conftool/dbconfig/20220701-181939-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P30736 and previous config saved to /var/cache/conftool/dbconfig/20220701-180434-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30735 and previous config saved to /var/cache/conftool/dbconfig/20220701-174929-ladsgroup.json
  • 17:47 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 17:47 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30734 and previous config saved to /var/cache/conftool/dbconfig/20220701-165407-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30733 and previous config saved to /var/cache/conftool/dbconfig/20220701-165347-ladsgroup.json
  • 16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P30732 and previous config saved to /var/cache/conftool/dbconfig/20220701-163842-ladsgroup.json
  • 16:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS bullseye
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P30731 and previous config saved to /var/cache/conftool/dbconfig/20220701-162337-ladsgroup.json
  • 16:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30730 and previous config saved to /var/cache/conftool/dbconfig/20220701-160831-ladsgroup.json
  • 15:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS bullseye
  • 15:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2167.codfw.wmnet with OS bullseye
  • 15:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2166.codfw.wmnet with OS bullseye
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 15:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
  • 15:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 15:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 15:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore[1008-1009]
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30729 and previous config saved to /var/cache/conftool/dbconfig/20220701-145937-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 14:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2167.codfw.wmnet with OS bullseye
  • 14:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2166.codfw.wmnet with OS bullseye
  • 14:39 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudstore[1008-1009]
  • 14:05 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 14:04 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30728 and previous config saved to /var/cache/conftool/dbconfig/20220701-135831-ladsgroup.json
  • 13:50 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:50 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P30727 and previous config saved to /var/cache/conftool/dbconfig/20220701-134326-ladsgroup.json
  • 13:43 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:36 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:36 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P30726 and previous config saved to /var/cache/conftool/dbconfig/20220701-132821-ladsgroup.json
  • 13:23 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 13:23 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:19 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:19 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30725 and previous config saved to /var/cache/conftool/dbconfig/20220701-131316-ladsgroup.json
  • 13:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:12 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:08 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2155 to s4 T311493', diff saved to https://phabricator.wikimedia.org/P30724 and previous config saved to /var/cache/conftool/dbconfig/20220701-130106-marostegui.json
  • 12:38 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:38 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 12:37 moritzm: uploaded rsyslog 8.2102.0-2+deb11u1+wmf2 to component/rsyslog-k8s (backport of latest security fixes on top of the rsyslog with mmkubernetes plugin)
  • 12:09 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:09 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30723 and previous config saved to /var/cache/conftool/dbconfig/20220701-120657-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30722 and previous config saved to /var/cache/conftool/dbconfig/20220701-120636-ladsgroup.json
  • 12:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30721 and previous config saved to /var/cache/conftool/dbconfig/20220701-115414-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P30720 and previous config saved to /var/cache/conftool/dbconfig/20220701-115131-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P30719 and previous config saved to /var/cache/conftool/dbconfig/20220701-113909-ladsgroup.json
  • 11:38 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 11:38 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P30718 and previous config saved to /var/cache/conftool/dbconfig/20220701-113626-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P30717 and previous config saved to /var/cache/conftool/dbconfig/20220701-112404-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30716 and previous config saved to /var/cache/conftool/dbconfig/20220701-112121-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30715 and previous config saved to /var/cache/conftool/dbconfig/20220701-110859-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30714 and previous config saved to /var/cache/conftool/dbconfig/20220701-110204-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30713 and previous config saved to /var/cache/conftool/dbconfig/20220701-110117-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P30712 and previous config saved to /var/cache/conftool/dbconfig/20220701-104612-ladsgroup.json
  • 10:45 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 10:45 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 10:44 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 10:44 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P30711 and previous config saved to /var/cache/conftool/dbconfig/20220701-103107-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30710 and previous config saved to /var/cache/conftool/dbconfig/20220701-102810-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30709 and previous config saved to /var/cache/conftool/dbconfig/20220701-101602-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30708 and previous config saved to /var/cache/conftool/dbconfig/20220701-094927-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:35 marostegui: Stop mysql on db2073 for cloning db2155
  • 07:47 mmandere: kubemaster2001, restart rsyslog
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2154 to s8 T311493', diff saved to https://phabricator.wikimedia.org/P30705 and previous config saved to /var/cache/conftool/dbconfig/20220701-074607-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2153 to s1 T311493', diff saved to https://phabricator.wikimedia.org/P30704 and previous config saved to /var/cache/conftool/dbconfig/20220701-073512-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2091 from dbctl T311803', diff saved to https://phabricator.wikimedia.org/P30703 and previous config saved to /var/cache/conftool/dbconfig/20220701-060000-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2092 from dbctl T311802', diff saved to https://phabricator.wikimedia.org/P30701 and previous config saved to /var/cache/conftool/dbconfig/20220701-054102-marostegui.json
  • 02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2165.codfw.wmnet with OS bullseye
  • 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
  • 02:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
  • 02:06 krinkle@deploy1002: Synchronized wmf-config/: I60edfb0f60 (3/3) (duration: 03m 31s)
  • 02:01 krinkle@deploy1002: Synchronized multiversion/: I60edfb0f60 (2/3) (duration: 03m 34s)
  • 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2165.codfw.wmnet with OS bullseye
  • 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2163.codfw.wmnet with OS bullseye
  • 01:39 krinkle@deploy1002: Synchronized tests/: I60edfb0f60 (1/3) (duration: 03m 32s)
  • 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
  • 01:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
  • 01:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:30 krinkle@deploy1002: Synchronized src/: I796f38 (3/3) (duration: 03m 24s)
  • 01:26 krinkle@deploy1002: Synchronized multiversion/: I796f38 (2/3) (duration: 03m 32s)
  • 01:23 krinkle@deploy1002: Synchronized tests/: I796f38 (1/3) (duration: 03m 41s)
  • 01:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2162.codfw.wmnet with OS bullseye
  • 01:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2163.codfw.wmnet with OS bullseye
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2162.codfw.wmnet with reason: host reimage
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2161.codfw.wmnet with OS bullseye
  • 00:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2162.codfw.wmnet with reason: host reimage
  • 00:53 ejegg: updated payments-wiki from ef53c82e to 78dee85e
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2168.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2167.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
  • 00:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
  • 00:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2162.codfw.wmnet with OS bullseye
  • 00:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2168.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2167.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2166.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2165.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2161.codfw.wmnet with OS bullseye
  • 00:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2166.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2163.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2165.mgmt.codfw.wmnet with reboot policy FORCED

Archives

See Server Admin Log/Archives.