You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Server Admin Log: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply) |
imported>Stashbot (ebernhardson: start restore of commonswiki_file from thanos-swift to cloudelastic) |
||
(42 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== 2022-05 | == 2022-07-05 == | ||
* | * 23:30 ebernhardson: start restore of commonswiki_file from thanos-swift to cloudelastic | ||
* | * 23:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - [[phab:T309648|T309648]] | ||
* 22:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - [[phab:T309648|T309648]] | |||
* | * 22:28 ryankemper: [[phab:T309648|T309648]] Manually restarting `cloudelastic1006` before proceeding to a normal rolling restart of cloudelastic | ||
* 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | |||
* | * 21:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* | * 21:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* | * 21:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* | * 21:55 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:811339{{!}}Enable title above tabs everywhere (T311773)]] (duration: 03m 23s) | ||
* 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | |||
* | * 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* | * 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* | * 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* | * 21:35 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: [[gerrit:811350{{!}}cirrus: Disable commonswiki writes to cloudelastic (T309648)]] (duration: 03m 42s) | ||
* | * 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* | * 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* | * 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* | * 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* | * 21:27 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: [[gerrit:811279{{!}}job queue: Squelch errors related to unwritable cloudelastic (T309648)]] (duration: 03m 37s) | ||
* | * 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* | * 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* | * 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | |||
* | * 21:19 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: [[gerrit:811280{{!}}job queue: Squelch errors related to unwritable cloudelastic (T309648)]] (duration: 03m 43s) | ||
* 20:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2174.codfw.wmnet with OS bullseye | |||
* | * 20:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2174.codfw.wmnet with reason: host reimage | ||
* | * 20:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2174.codfw.wmnet with reason: host reimage | ||
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | |||
* 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2173.codfw.wmnet with OS bullseye | |||
* | * 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* | * 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* | * 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* 20:24 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:811350{{!}}cirrus: Disable commonswiki writes to cloudelastic (T309648)]] (duration: 03m 23s) | |||
* | * 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* | * 20:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2174.codfw.wmnet with OS bullseye | ||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | * 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | * 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 20:17 | * 20:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|66c973087b7736b22ce7edb5b830e50e31710e4a}}: QuickSurveys: Increase coverage of research-incentive survey ([[phab:T311015|T311015]]) (duration: 03m 28s) | ||
* 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2173.codfw.wmnet with reason: host reimage | |||
* 20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2171.codfw.wmnet with OS bullseye | |||
* | * 20:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2173.codfw.wmnet with reason: host reimage | ||
* | * 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* | * 20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b1c217103753d886ab5b18b88f112ec26931bff2}}: GrowthExperiments: End mailing list campaign on eswiki ([[phab:T307985|T307985]]) (duration: 03m 39s) | ||
* | * 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | |||
* | * 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* | * 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* | * 20:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage | ||
* | * 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage | ||
* | * 19:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye | ||
* 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2173.codfw.wmnet with OS bullseye | |||
* | * 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2172.codfw.wmnet with OS bullseye | ||
* 19:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: host reimage | |||
* | * 19:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2172.codfw.wmnet with reason: host reimage | ||
* | * 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2172.codfw.wmnet with OS bullseye | ||
* | * 18:53 papaul: power down moss-be2002 for NVMe installation | ||
* 17 | * 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab2001.wikimedia.org | ||
* 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* | * 18:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db2171.codfw.wmnet with OS bullseye | ||
* 18:44 dzahn@cumin2002: START - Cookbook sre.dns.netbox | |||
* | * 18:40 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.wikimedia.org | ||
* | * 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab2001.codfw.wmnet | ||
* | * 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) | ||
* | * 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2170.codfw.wmnet with OS bullseye | ||
* | * 18:36 dzahn@cumin2002: START - Cookbook sre.dns.netbox | ||
* | * 18:32 papaul: power down moss-be2001 for NVMe installation | ||
* | * 18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage | ||
* | * 18:32 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.codfw.wmnet | ||
* | * 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage | ||
* | * 18:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: host reimage | ||
* | * 18:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2170.codfw.wmnet with reason: host reimage | ||
* | * 18:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye | ||
* | * 18:01 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2174 | ||
* | * 18:01 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2174 | ||
* | * 18:00 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2173 | ||
* | * 18:00 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2173 | ||
* | * 17:59 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2172 | ||
* | * 17:59 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2172 | ||
* | * 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2171 | ||
* | * 17:57 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2171 | ||
* | * 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2170 | ||
* | * 17:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2170 | ||
* | * 17:54 mutante: disabling puppet on gitlab* - debugging gerrit:811276 | ||
* | * 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye | ||
* | * 17:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2170.codfw.wmnet with OS bullseye | ||
* | * 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2174.mgmt.codfw.wmnet with reboot policy FORCED | ||
* | * 17:33 moritzm: installing haproxy security updates on stretch | ||
* | * 17:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2174.mgmt.codfw.wmnet with reboot policy FORCED | ||
* | * 16:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2173.mgmt.codfw.wmnet with reboot policy FORCED | ||
* | * 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2172.mgmt.codfw.wmnet with reboot policy FORCED | ||
* | * 16:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye | ||
* | * 16:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2169.codfw.wmnet with OS bullseye | ||
* | * 16:44 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw | ||
* | * 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: host reimage | ||
* | * 16:34 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw | ||
* 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: host reimage | |||
* | * 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2173.mgmt.codfw.wmnet with reboot policy FORCED | ||
* 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2172.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2171.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2170.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 16:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2169.codfw.wmnet with OS bullseye | |||
* 16:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2164.codfw.wmnet with OS bullseye | |||
* 15:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2171.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2164.codfw.wmnet with reason: host reimage | |||
* 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2170.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2164.codfw.wmnet with reason: host reimage | |||
* 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox | |||
* 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2169.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 15:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye | |||
* 15:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2164.codfw.wmnet with OS bullseye | |||
* 15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2169.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 15:09 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2169 | |||
* 15:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2169 | |||
* 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | * 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* 15:05 | * 15:05 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db2169 | ||
* 15:05 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db2169 | |||
* 15:05 moritzm: installing firejail updates on stretch | |||
* 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye | |||
* 15:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox | * 15:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox | ||
* 15:00 | * 15:00 moritzm: draining ganeti2024 for eventual reimage [[phab:T311686|T311686]] | ||
* 14:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2164.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 14: | * 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD | ||
* 14: | * 14:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD | ||
* 14: | * 14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | ||
* 14:22 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw | |||
* 14: | * 14:22 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw | ||
* 14: | * 14:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox | ||
* 14: | * 14:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2164.mgmt.codfw.wmnet with reboot policy FORCED | ||
* 14: | * 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 13:34 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . | |||
* | * 13:33 urbanecm: UTC afternoon B&C window done | ||
* | * 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* | * 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* | * 13:26 urbanecm@deploy1002: Synchronized w/static.php: {{Gerrit|300ef4a5ee6f0c35de831e88eb2f8169e7f66e97}}: static.php: Update call to deprecated IContextSource::getStats (duration: 03m 41s) | ||
* 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | |||
* | * 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* | * 13:15 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|1287b969fc42aee6efae5ff1f1943394ba35e326}}: Drop deprecated feature flags ([[phab:T310684|T310684]]) (duration: 03m 32s) | ||
* | * 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* | * 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|891057f6ba555b2ece0424e3364d853eb20555da}}: Drop dependent feature flags ([[phab:T310684|T310684]]) (duration: 03m 37s) | |||
* 14 | * 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* 14 | * 12:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org | ||
* | * 12:42 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org | ||
* 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30861 and previous config saved to /var/cache/conftool/dbconfig/20220705-124101-ladsgroup.json | |||
* | * 12:37 btullis@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. | ||
* | * 12:36 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. | ||
* 12:31 moritzm: draining ganeti2023 for eventual reimage [[phab:T311686|T311686]] | |||
* | * 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): '[[phab:T311106|T311106]]', diff saved to https://phabricator.wikimedia.org/P30859 and previous config saved to /var/cache/conftool/dbconfig/20220705-122941-ladsgroup.json | ||
* | * 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. | ||
* | * 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2158 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P30848 and previous config saved to /var/cache/conftool/dbconfig/20220705-110432-marostegui.json | ||
* | * 11:01 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. | ||
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | |||
* | * 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* | * 10:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* | * 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* | * 10:30 _joe_: running benchmarks in codfw for php7.2/7.4 comparison. | ||
* | * 10:29 moritzm: sudo gnt-cluster upgrade --to 3.0 for ganeti/codfw [[phab:T311686|T311686]] | ||
* | * 10:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001 | ||
* 10:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001 | |||
* | * 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* | * 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | |||
* | * 10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* | * 10:00 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.19 refs [[phab:T308072|T308072]] | ||
* | * 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* | * 09:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* | * 09:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* | * 09:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* | * 09:36 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.19 refs [[phab:T308072|T308072]] (duration: 34m 21s) | ||
* | * 09:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1002 | ||
* | * 09:33 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1002 | ||
* 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | * 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | * 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | * 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | * 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* 09 | * 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | * 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | * 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 09: | * 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* 08: | * 09:02 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.19 refs [[phab:T308072|T308072]] | ||
* 08: | * 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) | ||
* 08: | * 08:52 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces | ||
* 08: | * 08:43 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . | ||
* 08:30 moritzm: uploaded 7.4.30-3+0~20220627.69+debian10~1.gbpf2b381+wmf1+buster3 to component/php74 (pulling php-common with the socket helper) [[phab:T311386|T311386]] | |||
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30835 and previous config saved to /var/cache/conftool/dbconfig/20220705-082415-root.json | |||
* 08: | * 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30834 and previous config saved to /var/cache/conftool/dbconfig/20220705-082058-root.json | ||
* 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | * 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 08:12 | * 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | * 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | * 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* 08: | * 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30833 and previous config saved to /var/cache/conftool/dbconfig/20220705-080911-root.json | ||
* 08: | * 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30832 and previous config saved to /var/cache/conftool/dbconfig/20220705-080554-root.json | ||
* 07:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|89aef540e22aaded6c279d9d11c769507e497b6a}}: MentorDashboard: enable the Vue version of the dashboard in beta ([[phab:T300532|T300532]]) (duration: 03m 18s) | |||
* 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | |||
* | * 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30831 and previous config saved to /var/cache/conftool/dbconfig/20220705-075408-root.json | ||
* | * 07:54 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|c8c092a4133d119bf9aaece6f934ca7744ea6951}}: trwiki: Change old and new vector logos for 500k articles ([[phab:T311946|T311946]]; 3/3) (duration: 03m 34s) | ||
* 07:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | |||
* | * 07:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 07: | * 07:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* 07: | * 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30830 and previous config saved to /var/cache/conftool/dbconfig/20220705-075050-root.json | ||
* 07:50 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|c8c092a4133d119bf9aaece6f934ca7744ea6951}}: trwiki: Change old and new vector logos for 500k articles ([[phab:T311946|T311946]]; 2/3) (duration: 03m 36s) | |||
* 07: | * 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 07: | * 07:46 urbanecm@deploy1002: Synchronized static/: {{Gerrit|c8c092a4133d119bf9aaece6f934ca7744ea6951}}: trwiki: Change old and new vector logos for 500k articles ([[phab:T311946|T311946]]; 1/3) (duration: 03m 17s) | ||
* 07:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | |||
* 07:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | |||
* 07: | * 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* 07: | * 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30829 and previous config saved to /var/cache/conftool/dbconfig/20220705-073904-root.json | ||
* 07: | * 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30828 and previous config saved to /var/cache/conftool/dbconfig/20220705-073546-root.json | ||
* 07: | * 07:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: {{Gerrit|ce64780fbd78a414c6ab08fc374186ae4dd58bac}}: SuggestedEdits: Adjust thumbnailSource logic ([[phab:T311789|T311789]]) (duration: 03m 32s) | ||
* 07: | * 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 07: | * 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | |||
* 07: | * 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30827 and previous config saved to /var/cache/conftool/dbconfig/20220705-072400-root.json | |||
* 07: | * 07:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 07:21 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: {{Gerrit|d5050b773992aa6100aa14cd328836ff336ef8c1}}: Retrieve pages-with-suggestion via Elastic scroll directly ([[phab:T311476|T311476]]) (duration: 03m 32s) | |||
* 07: | * 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 07: | * 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* | * 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30826 and previous config saved to /var/cache/conftool/dbconfig/20220705-072043-root.json | ||
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | |||
* 07:17 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CentralNotice/includes/specials/CentralNotice.php: {{Gerrit|414b7b8a14b451f9bd0fb0c36d44fe6a9310102e}}: Only add tabs to special pages ([[phab:T311944|T311944]]) (duration: 03m 30s) | |||
* | * 07:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|14df0e25aabf21715b281a9dbb5893ae2ae7db9a}}: zh(wikiversity{{!}}wiktionary): Disable local upload ([[phab:T312012|T312012]]) (duration: 03m 47s) | ||
* | * 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | |||
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | |||
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | |||
* | * 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30824 and previous config saved to /var/cache/conftool/dbconfig/20220705-070856-root.json | ||
* | * 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30823 and previous config saved to /var/cache/conftool/dbconfig/20220705-070539-root.json | ||
* 02: | * 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch | ||
* 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch | |||
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Decommission db2073 [[phab:T311837|T311837]]', diff saved to https://phabricator.wikimedia.org/P30822 and previous config saved to /var/cache/conftool/dbconfig/20220705-070019-marostegui.json | |||
* 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2073.codfw.wmnet | |||
* 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30821 and previous config saved to /var/cache/conftool/dbconfig/20220705-065352-root.json | |||
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30820 and previous config saved to /var/cache/conftool/dbconfig/20220705-065035-root.json | |||
* 06:50 marostegui@cumin1001: START - Cookbook sre.dns.netbox | |||
* 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2073.codfw.wmnet | |||
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30819 and previous config saved to /var/cache/conftool/dbconfig/20220705-063848-root.json | |||
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30818 and previous config saved to /var/cache/conftool/dbconfig/20220705-063531-root.json | |||
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30817 and previous config saved to /var/cache/conftool/dbconfig/20220705-063402-root.json | |||
* 06:09 marostegui: dbmaint s6@eqiad [[phab:T298557|T298557]] | |||
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 [[phab:T311522|T311522]]', diff saved to https://phabricator.wikimedia.org/P30816 and previous config saved to /var/cache/conftool/dbconfig/20220705-060526-root.json | |||
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - [[phab:T311522|T311522]]', diff saved to https://phabricator.wikimedia.org/P30814 and previous config saved to /var/cache/conftool/dbconfig/20220705-060111-marostegui.json | |||
* 06:00 marostegui: Starting s6 eqiad failover from db1131 to db1173 - [[phab:T311522|T311522]] | |||
* 05:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance | |||
* 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance | |||
* 05:58 TimStarling: deploying multi-DC support g 801621, manual puppet run on cp1080 | |||
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 [[phab:T311522|T311522]]', diff saved to https://phabricator.wikimedia.org/P30813 and previous config saved to /var/cache/conftool/dbconfig/20220705-052219-marostegui.json | |||
* 05:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T311522|T311522]] | |||
* 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s6 [[phab:T311522|T311522]] | |||
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | |||
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | * 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 02: | * 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 02: | * 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | ||
* 02: | * 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 02: | * 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 02 | * 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply | |||
* | |||
== 2022-07-04 == | |||
* 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org | |||
* 19:53 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org | |||
* 19:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2004-dev.wikimedia.org | |||
* 19:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for | |||
* 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | * 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply | ||
* 10:25 godog: silence etcd p a g e | |||
* 10:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | * 10:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply | ||
* 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | * 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply | ||
* 10:23 |