You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(marostegui: Start event scheduler on db1115 after a massive delete - T252324)
imported>Stashbot
(gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply)
 
(941 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-05-10 ==
== 2023-03-24 ==
* 12:18 marostegui: Start event scheduler on db1115 after a massive delete - [[phab:T252324|T252324]]
* 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:05 marostegui: Stop event scheduler on db1115 to perform a massive delete - [[phab:T252324|T252324]]
* 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:27 dcausse: restarting blazgraph on wdqs1004: [[phab:T242453|T242453]]
* 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:56 marostegui: Change scaling_governor from powersave to performance on db1115 - [[phab:T252324|T252324]]
* 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:25 marostegui: Stop MySQL and restart db1115 - [[phab:T252324|T252324]]
* 08:50 marostegui: Restart mysql on db1115 to change buffer pool size from 20GB to 40GB [[phab:T252324|T252324]] (
* 08:44 elukey: Power cycle analytics1052 after eno1 issue
* 08:01 marostegui: Disable unused events like %_schema [[phab:T252324|T252324]]  [[phab:T231185|T231185]]
* 07:11 marostegui: Restart mysql on db1115 [[phab:T231185|T231185]]
* 07:11 marostegui: Truncate tendril. processlist_query_log [[phab:T231185|T231185]]


== 2020-05-08 ==
== 2023-03-23 ==
* 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 [[phab:T251598|T251598]]
* 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view on labsdb1012 [[phab:T251598|T251598]]
* 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:33 bstorm_: cleaning up wb_terms_no_longer_updated view on labsdb1009 [[phab:T251598|T251598]]
* 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:06 ottomata: running prefered replica election for kafka-jumbo  to get preferred leaders back after reboot of broker earlier today - [[phab:T252203|T252203]]
* 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:16 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:12 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:07 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:12 andrewbogott: reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for [[phab:T252121|T252121]]
* 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:59 marostegui: Extend /srv by 500G on labsdb1011 [[phab:T249188|T249188]]
* 22:30 mutante: moscovium - rebooting to finalize distro release upgrade - [[phab:T332952|T332952]]
* 16:55 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:20 mutante: moscovium performing apt-get full-upgrade [[phab:T332952|T332952]]
* 16:53 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:09 mutante: moscovium - when doing an in-place upgrade from buster to bullseye and you replace the string in sources.list, you also need to replace "bullseye-updates" with "bullseye-security" in the security.debian.org lines - that this is needed is called a bug at https://shagain.club/index.php/archives/641/ - [[phab:T327068|T327068]]
* 16:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:00 mutante: moscovium - apt-get full-upgrade ; apt autoremove ; replace buster with bullseye in sources.list ; repeat apt-get upgrade/full-upgrade etc. (https://wiki.debian.org/DebianUpgrade) [[phab:T327068|T327068]]
* 16:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 22:00 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc2002.codfw.wmnet with OS bullseye
* 16:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:57 mutante: moscovium - apt-get upgrade (rt.wikimedia.org going into maintenance) [[phab:T327068|T327068]]
* 16:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
* 16:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
* 16:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:48 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
* 15:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:45 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
* 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:31 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 15:36 ottomata: starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - [[phab:T252203|T252203]]
* 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:27 ottomata: stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - [[phab:T252203|T252203]]
* 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:50 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s)
* 21:25 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 14:50 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only
* 21:24 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 14:05 akosiaris: [[phab:T243106|T243106]] undo experiment with DROP iptable rules this time around. Use mw1331, mw1348
* 20:42 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 13:22 vgutierrez: rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - [[phab:T249335|T249335]]
* 20:42 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 13:20 akosiaris: [[phab:T243106|T243106]] redo experiment with DROP iptable rules this time around. Use mw1331, mw1348
* 20:35 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 13:16 akosiaris: [[phab:T243106|T243106]] undo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348. Experiment done successfully, no issues to the infrastructure.
* 20:34 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 12:49 akosiaris: [[phab:T243106|T243106]] redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348
* 20:33 taavi@deploy2002: Finished scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] (duration: 10m 56s)
* 12:49 akosiaris: [[phab:T243106|T243106]] redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle
* 20:33 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
* 11:49 hnowlan: restarting cassandra on restbase2009 for java updates
* 20:24 taavi@deploy2002: abi and taavi: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 11:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:23 taavi@deploy2002: Started scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]]
* 11:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
* 11:08 akosiaris: repool eqiad eventgate-analytics. Test concluded
* 19:36 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
* 11:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:54 mutante: disabling puppet on puppetmasters temporarily to switch them carefully to use httpd module and not apache module which we want to get rid of
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 09:52 akosiaris: depool eqiad eventgate-analytics for a test involving reinitializing the eqiad kubernetes cluster
* 19:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 19:31 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 09:51 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 19:31 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
* 09:45 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=eventgate-analytics.*
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2002
* 08:20 vgutierrez: rolling restart of ats-tls on esams - [[phab:T249335|T249335]]
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:19 vgutierrez: ats-tls restart on cp3050 and cp3052 (max_connections_active_in experiment) - [[phab:T249335|T249335]]
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 07:07 mutante: phabricator rmdir /var/run/phd/pid  - empty and now unused
* 19:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 07:01 moritzm: installing php5 security updates
* 19:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 05:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc2002
* 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 05:10 marostegui: Upgrade pc1010
* 17:39 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 00:30 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert all wikis except test to 1.35.0-wmf.30 for [[phab:T252179|T252179]]
* 17:39 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 00:19 brennen: rolling 1.35.0-wmf.31 train back to group0 for [[phab:T252179|T252179]]
* 17:39 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 17:38 mutante: moscovium - systemctl stop rsync
* 17:38 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 17:38 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 17:37 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 17:18 mutante: aphlict1001 - systemctl reset-failed; systemctl start logrotate ; systemctl start logrotate.timer
* 16:59 sukhe: rolling out CR 901333 to A:cp-text [[phab:T313578|T313578]]
* 16:45 sukhe: disable Puppet in A:cp to test and then merge CR 901333
* 16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2002.codfw.wmnet with OS bullseye
* 16:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2002.codfw.wmnet with OS bullseye
* 16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
* 16:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
* 16:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 16:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 16:01 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc1002.wikimedia.org with OS bullseye
* 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
* 15:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
* 15:12 vgutierrez: testing haproxy_2.6.11-1~bpo11+wmf2_amd64.deb in text@ulsfo - [[phab:T332796|T332796]]
* 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc1002.wikimedia.org with OS bullseye
* 14:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
* 14:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host lists1003.wikimedia.org with OS bullseye
* 14:53 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:53 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:51 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:51 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
* 14:45 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
* 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1002.wikimedia.org
* 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
* 14:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host lists1003.wikimedia.org with OS bullseye
* 14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1002.wikimedia.org on all recursors
* 14:24 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc1002.wikimedia.org on all recursors
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
* 14:22 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:22 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host pybal-test2003.codfw.wmnet with OS bullseye
* 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
* 14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
* 14:16 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 14:15 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 14:15 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 14:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc1002.wikimedia.org
* 14:13 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 14:13 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 14:11 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d] (duration: 01m 32s)
* 14:11 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
* 14:10 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d]
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d] (duration: 00m 09s)
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d]
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d] (duration: 05m 10s)
* 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
* 14:03 joal@deploy2002: Started deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d]
* 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
* 13:55 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host pybal-test2003.codfw.wmnet with OS bullseye
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:46 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac] (duration: 01m 28s)
* 13:46 TheresNoTime: close UTC afternoon backport window
* 13:45 samtar@deploy2002: Finished scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] (duration: 07m 46s)
* 13:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac]
* 13:44 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac] (duration: 00m 08s)
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac]
* 13:43 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac] (duration: 13m 06s)
* 13:39 samtar@deploy2002: samtar: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:37 samtar@deploy2002: Started scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]]
* 13:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] (duration: 08m 05s)
* 13:30 joal@deploy2002: Started deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac]
* 13:29 samtar@deploy2002: samtar and sgimeno: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]]
* 13:26 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki ckbwiki --fix` [[phab:T332470|T332470]]
* 13:25 samtar@deploy2002: Finished scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] (duration: 08m 39s)
* 13:18 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:16 samtar@deploy2002: Started scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]]
* 13:15 samtar@deploy2002: Finished scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] (duration: 11m 47s)
* 13:08 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:03 samtar@deploy2002: Started scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]]
* 12:14 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-druid1001.eqiad.wmnet with OS bullseye
* 12:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:58 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:57 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2004.codfw.wmnet with OS bullseye
* 11:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
* 11:47 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache upload cluster - [[phab:T332796|T332796]]
* 11:36 btullis@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-druid1001.eqiad.wmnet with OS bullseye
* 11:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
* 11:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
* 11:26 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc2002.wikimedia.org with OS bullseye
* 11:15 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:15 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2004.codfw.wmnet with OS bullseye
* 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
* 11:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
* 11:05 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 11:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
* 10:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
* 10:44 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc2002.wikimedia.org with OS bullseye
* 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2002.wikimedia.org
* 10:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2005.codfw.wmnet with OS bullseye
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2002.wikimedia.org on all recursors
* 10:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc2002.wikimedia.org on all recursors
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
* 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
* 10:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
* 10:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc2002.wikimedia.org
* 10:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2005.codfw.wmnet with OS bullseye
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
* 09:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
* 09:47 moritzm: uploaded prometheus-druid-exporter 0.8-2 for bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]]
* 08:21 elukey: clean up docker and reboot kubernetes2024 to enable overlay2 - [[phab:T332803|T332803]]
* 08:11 vgutierrez: testing HAProxy 2.6.11 in cp4044 - [[phab:T332796|T332796]]
* 08:08 vgutierrez: fetch haproxy 2.6.11 in apt.wm.o thirdparty/haproxy26 for bullseye & buster
* 08:04 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache text cluster - [[phab:T332796|T332796]]
* 07:54 elukey: clean up docker and reboot kubernetes2023 to enable overlay2 - [[phab:T332803|T332803]]
* 07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
* 07:42 elukey: clean up docker on kubernetes1024 (cordon + stop kubelet + docker + clean /var/lib/docker/*) and reboot to enable overlay2 - [[phab:T332803|T332803]]
* 07:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
* 07:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45928 and previous config saved to /var/cache/conftool/dbconfig/20230323-072315-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45927 and previous config saved to /var/cache/conftool/dbconfig/20230323-070811-root.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45926 and previous config saved to /var/cache/conftool/dbconfig/20230323-065306-root.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45925 and previous config saved to /var/cache/conftool/dbconfig/20230323-063800-root.json
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45924 and previous config saved to /var/cache/conftool/dbconfig/20230323-062255-root.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45923 and previous config saved to /var/cache/conftool/dbconfig/20230323-060750-root.json
* 05:37 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 05:34 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 04:25 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 02:07 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 02:00 mutante: rsyncing ~4GB files for static-codereview.wikimedia.org from old to newer VMs for [[phab:T331896|T331896]] - no automatic sync / deploy for these
* 01:05 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]"
* 01:03 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]"
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 00:57 denisse@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host doc2002.codfw.wmnet with OS bullseye
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 00:27 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
* 00:10 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc1003.eqiad.wmnet with OS bullseye


== 2020-05-07 ==
== 2023-03-22 ==
* 22:36 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
* 23:59 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
* 22:31 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Scribunto/includes/engines/LuaCommon/TitleLibrary.php: [[gerrit:595054{{!}}Handle RevisionAccessException with try-catch (T252156)]] (duration: 01m 08s)
* 23:56 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
* 20:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 23:46 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc1003.eqiad.wmnet with OS bullseye
* 20:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
* 20:10 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingStreamNames: set initial stream names, as yet unused - [[phab:T238230|T238230]] (duration: 01m 07s)
* 23:34 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
* 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.35.0-wmf.30
* 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:09 brennen: rolling 1.35.0-wmf.31 back to group1
* 23:33 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 19:09 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki1001 - [[phab:T252010|T252010]]
* 23:32 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
* 23:32 zabe: zabe@mwmaint2002:~$ mwscript namespaceDupes.php wikimaniawiki --fix # [[phab:T332782|T332782]]
* 18:25 ppchelko@deploy1001: Finished deploy [changeprop/deploy@383fba5]: Enable both purging types [[phab:T252142|T252142]] (duration: 01m 17s)
* 23:31 zabe@deploy2002: Finished scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] (duration: 10m 03s)
* 18:23 ppchelko@deploy1001: Started deploy [changeprop/deploy@383fba5]: Enable both purging types [[phab:T252142|T252142]]
* 23:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1003.wikimedia.org
* 18:15 Urbanecm: Morning SWAT done
* 23:24 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|899c175}}: Update project icons to refreshed SVGs ([[phab:T249047|T249047]]; part 2/2) (duration: 01m 06s)
* 23:24 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
* 18:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|899c175}}: Update project icons to refreshed SVGs ([[phab:T249047|T249047]]; part 1/2) (duration: 01m 08s)
* 23:22 zabe@deploy2002: zabe: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|54bd2f1}}: Add the investigate right to the checkuser group on testwiki ([[phab:T251932|T251932]]) (duration: 01m 08s)
* 23:21 zabe@deploy2002: Started scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]]
* 17:50 bsitzmann@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:15 taavi: UTC late backports complete
* 17:46 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:13 taavi@deploy2002: Finished scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] (duration: 07m 29s)
* 17:44 bsitzmann@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 21:08 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc1003.eqiad.wmnet
* 17:44 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: (no justification provided) (duration: 05m 31s)
* 21:08 taavi@deploy2002: taavi: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 17:38 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: (no justification provided)
* 21:06 taavi@deploy2002: Started scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]]
* 17:18 ejegg: updated payments-wiki from {{Gerrit|afb84cc391}} to {{Gerrit|dabba1804c}}
* 21:05 taavi@deploy2002: Finished scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] (duration: 07m 17s)
* 16:46 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption (duration: 01m 05s)
* 20:59 taavi@deploy2002: taavi: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 16:45 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption
* 20:58 taavi@deploy2002: Started scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]]
* 16:42 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 20:54 samtar@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:900748{{!}}Enable page tools for anonymous users (T331052)]] (duration: 10m 10s)
* 16:36 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 20:37 akosiaris: uncordon reboot kubernetes1023. It was drained previously for ⚓ [[phab:T332803|T332803]]
* 16:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] (duration: 11m 47s)
* 16:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:32 akosiaris: reboot kubernetes1023 for a test once more, ⚓ [[phab:T332803|T332803]]
* 16:29 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 20:32 akosiaris: reboot kubernetes1023 for a test once more
* 16:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:28 samtar@deploy2002: samtar and nray: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 16:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:25 akosiaris: reboot kubernetes1023 for a test
* 16:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:24 samtar@deploy2002: Started scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]]
* 16:26 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic (duration: 01m 45s)
* 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] (duration: 09m 57s)
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists1003.wikimedia.org on all recursors
* 16:24 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic
* 20:15 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache lists1003.wikimedia.org on all recursors
* 16:23 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic (duration: 00m 24s)
* 20:15 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:23 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic
* 20:15 samtar@deploy2002: kharlan and samtar: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:59 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]]
* 15:51 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 20:12 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.eqiad.wmnet on all recursors
* 15:36 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Collection/includes/Specials/SpecialCollection.php: [[phab:T251460|T251460]] Set skin on BaseTemplates if you are using getSkin (duration: 01m 08s)
* 20:11 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.eqiad.wmnet on all recursors
* 15:28 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:27 vgutierrez: rolling restart of ats-tls on text@esams - [[phab:T249335|T249335]]
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
* 15:26 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:10 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
* 15:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] (duration: 07m 22s)
* 15:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:07 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 15:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:07 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.eqiad.wmnet
* 15:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:07 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 15:12 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 20:07 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1003.wikimedia.org
* 15:09 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 20:06 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doc1003.wikimedia.org
* 15:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
* 14:59 moritzm: imported component/facter3 for stretch-wikimedia into "main"
* 20:06 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
* 14:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 14:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
* 14:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
* 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:05 denisse@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:50 moritzm: imported component/puppet5 for stretch-wikimedia into "main"
* 20:04 samtar@deploy2002: samtar and matmarex: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:49 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 20:02 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0 (duration: 00m 21s)
* 14:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:02 samtar@deploy2002: Started scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]]
* 14:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:02 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0
* 14:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:01 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:01 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.wikimedia.org
* 14:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:16 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:12 mutante: rsyncing /srv/org/wikimedia/sitemaps files for https://sitemaps.wikimedia.org from old to new machines. most other things are auto-deployed by puppet or puppet running intial scap or automatic rsync.. this is not. rsync -av /srv/org/wikimedia/sitemaps/ rsync://miscweb2003.codfw.wmnet/miscapps-srv/org/wikimedia/sitemaps/ [[phab:T331896|T331896]] - but also see [[phab:T332101|T332101]]
* 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dborch1002.wikimedia.org
* 14:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
* 14:17 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:38 _joe_: stopping apache on mwdebug1001 to test the new envoy error page
* 14:07 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:15 hashar@deploy2002: Synchronized composer.json: build: add local typos check to composer.json # [[phab:T332121|T332121]] (duration: 06m 44s)
* 14:06 moritzm: imported component/facter3 for jessie-wikimedia into "main"
* 17:12 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
* 13:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:09 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 13:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:06 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 13:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:06 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 13:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts dborch1002.wikimedia.org
* 13:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:05 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 13:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:04 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 13:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 13:04 jynus: disabling puppet on all db hosts to control deployment of new paging alert [[phab:T172489|T172489]]
* 16:45 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided) (duration: 00m 12s)
* 13:02 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers (duration: 02m 43s)
* 16:45 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided)
* 13:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 12:59 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers
* 16:37 eoghan@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
* 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:37 eoghan@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
* 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 16:35 vgutierrez: rolling downgrade to HAProxy 2.6.9 in text@esams - [[phab:T332796|T332796]]
* 12:43 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI (duration: 16m 20s)
* 16:24 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
* 12:27 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI
* 16:19 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
* 12:13 addshore@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Wikibase: [[gerrit:594920]] [[phab:T252079|T252079]] Revert "Move prefetching-term-lookup-callback service wiring" (duration: 01m 12s)
* 16:18 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 12:12 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:18 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 12:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:58 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dborch1001.wikimedia.org with OS bullseye
* 11:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 11:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
* 11:33 moritzm: imported component/puppet5 for jessie-wikimedia into "main"
* 15:53 moritzm: uploaded druid 0.19.wmf0-2 to bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]]
* 11:31 jbond42: enable ferm-status script https://gerrit.wikimedia.org/r/c/operations/puppet/+/576102
* 15:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
* 11:10 matthiasmullie: EU swat done
* 15:46 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 11:07 mlitn@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikibaseMediaInfo/: [MediaInfo] Add dummy concept chips without thumbnail (duration: 01m 09s)
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 10:07 moritzm: installing Java security updates on restbase/sessionstore
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
* 09:11 elukey: roll restart cassandra on aqs1005 to pick up new openjdk upgrades (canary)
* 15:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
* 08:32 moritzm: upgrading restbase-dev to latest OpenJDK security update
* 15:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
* 08:06 jynus: setting pc2007, pc2009 as read-write
* 15:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
* 07:44 godog: further decrease weight for ms-be10[678] - [[phab:T252008|T252008]]
* 15:39 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 15:31 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 05:33 elukey: restart hadoop yarn nodemanager on analytics1071
* 15:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 05:22 marostegui: Reimage db2078
* 15:30 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only=off for maintenance [[phab:T251158|T251158]]', diff saved to https://phabricator.wikimedia.org/P11167 and previous config saved to /var/cache/conftool/dbconfig/20200507-050419-marostegui.json
* 15:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1001.wikimedia.org with OS bullseye
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only for maintenance [[phab:T251158|T251158]]', diff saved to https://phabricator.wikimedia.org/P11166 and previous config saved to /var/cache/conftool/dbconfig/20200507-050046-marostegui.json
* 15:27 elukey: `racadm racreset` for kafka-main2004 (no http idrac available for the cookbook, ssh one available)
* 02:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.30 for [[phab:T252079|T252079]]
* 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 02:55 brennen: reverting group1 to 1.35.0-wmf.30 for [[phab:T252079|T252079]]
* 15:26 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
* 00:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:25 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:25 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
* 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 15:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 15:22 hnowlan: removing java packages from maps hosts
* 15:17 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 15:17 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 15:13 hnowlan: removing cassandra packages from maps hosts
* 15:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 14:57 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:57 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 14:54 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:53 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:21 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45917 and previous config saved to /var/cache/conftool/dbconfig/20230322-141923-root.json
* 14:17 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 14:17 sukhe: enable Puppet on A:wikidough to roll out dnsdist.conf change
* 14:13 sukhe: disable Puppet on A:wikidough to roll out dnsdist.conf change
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45916 and previous config saved to /var/cache/conftool/dbconfig/20230322-140418-root.json
* 14:02 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45915 and previous config saved to /var/cache/conftool/dbconfig/20230322-134913-root.json
* 13:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45914 and previous config saved to /var/cache/conftool/dbconfig/20230322-133409-root.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45913 and previous config saved to /var/cache/conftool/dbconfig/20230322-131904-root.json
* 13:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG (duration: 00m 12s)
* 13:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG
* 13:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:05 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45912 and previous config saved to /var/cache/conftool/dbconfig/20230322-130359-root.json
* 13:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:44 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 12:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 12:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 12:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:30 marostegui: Poweroff db1121 (lag will show on wikireplicas for s4 section) [[phab:T323961|T323961]]
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool needs to be rebooted [[phab:T323961|T323961]]', diff saved to https://phabricator.wikimedia.org/P45910 and previous config saved to /var/cache/conftool/dbconfig/20230322-112031-root.json
* 11:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 11:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 11:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 11:15 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
* 11:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 11:09 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:02 jbond: upgrader prometheus-ipmi-exporter on buster and bullseye
* 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:36 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:34 elukey: `racadm racreset` for kafka-main2005 - http idrac not available (ssh on works fine)
* 10:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:29 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:26 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 10:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1004.eqiad.wmnet with OS bullseye
* 10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
* 09:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
* 09:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1004.eqiad.wmnet with OS bullseye
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
* 09:27 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
* 09:23 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
* 09:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
* 09:11 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:10 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:02 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 09:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
* 08:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
* 08:52 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 08:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:25 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:24 XioNoX: deploy measure-$site.wikimedia.org CNAMES
* 08:20 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 08:20 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 08:18 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 08:17 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 07:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 141082
* 07:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 141082
* 00:57 zabe@deploy2002: Finished scap: update interwiki cache (duration: 07m 02s)
* 00:50 zabe@deploy2002: Started scap: update interwiki cache
* 00:47 zabe@deploy2002: Finished scap: [[phab:T332115|T332115]] (duration: 06m 56s)
* 00:40 zabe@deploy2002: Started scap: [[phab:T332115|T332115]]
* 00:40 zabe: create Wikipedia Angika (anpwiki) # [[phab:T332115|T332115]]
* 00:38 zabe@deploy2002: Finished scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] (duration: 27m 00s)
* 00:29 zabe@deploy2002: zabe: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 00:11 zabe@deploy2002: Started scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]]


== 2020-05-06 ==
== 2023-03-21 ==
* 23:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable GrowthExperiments guidance on testwiki (duration: 01m 07s)
* 23:46 zabe@deploy2002: Finished scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] (duration: 30m 08s)
* 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable password-reset-update on Wikipedias ([[phab:T245791|T245791]]) (duration: 01m 07s)
* 23:35 zabe@deploy2002: zabe: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 22:22 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/includes/revisionlist/RevisionItem.php: [[gerrit:594803{{!}}RevisionItem: Fix providing timestamp in getRevisionLink ]] (duration: 01m 09s)
* 23:15 zabe@deploy2002: Started scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]]
* 21:45 andrewbogott: updating puppet compiler facts
* 23:07 zabe@deploy2002: Finished scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]] (duration: 07m 10s)
* 21:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:00 zabe@deploy2002: Started scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]]
* 21:05 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:47 ejegg: payments-wiki upgraded from {{Gerrit|0fd66b1f}} to {{Gerrit|ab0a55a2}}
* 21:04 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] (duration: 07m 15s)
* 20:35 ejegg: updated Fundraising CiviCRM from {{Gerrit|b15b2cfbb5}} to {{Gerrit|cfb6101e39}}
* 22:04 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 19:08 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.31 (duration: 01m 08s)
* 22:03 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]]
* 19:07 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.31
* 21:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 19:03 brennen: CORRECTION: 1.35.0-wmf.31 train unblocked ([[phab:T249963|T249963]]), rolling forward to group1
* 21:21 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 19:03 brennen: 1.35.0-wmf.31 train unblocked ([[phab:T249963|T249963]]), rolling forward to group0
* 21:02 AndyRussG: update SmashPig  config {{Gerrit|6e651fd4}} -> {{Gerrit|035f602a}}
* 18:58 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594778/ fixes UBN [[phab:T252052|T252052]] (duration: 01m 09s)
* 20:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 18:54 volans: upgraded spicerack to spicerack_0.0.34-1_amd64.deb on cumin[12]001
* 20:48 taavi: start [[phab:T315510|T315510]] migration script on group2 s7 wikis
* 18:45 volans: uploaded spicerack_0.0.34-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 20:39 taavi@deploy2002: Finished scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] (duration: 09m 01s)
* 18:44 volans@deploy1001: Finished deploy [homer/deploy@8224f0a]: Release v0.2.2 (duration: 00m 18s)
* 20:31 taavi@deploy2002: matmarex and taavi: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 18:44 volans@deploy1001: Started deploy [homer/deploy@8224f0a]: Release v0.2.2
* 20:30 taavi@deploy2002: Started scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]]
* 18:28 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594768/ fixes [[phab:T252043|T252043]] (duration: 01m 08s)
* 20:20 taavi@deploy2002: Finished scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] (duration: 17m 40s)
* 17:34 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:10 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:09 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 17:12 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:04 taavi@deploy2002: esanders and taavi and matmarex: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 17:06 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:02 taavi@deploy2002: Started scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]]
* 17:05 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 19:52 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 16:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 15:41 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 19:43 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 15:27 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 19:41 jhathaway@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host dborch1002.wikimedia.org with OS bullseye
* 13:36 mutante: puppetmaster - revoking cert for webserver-misc-apps , recreating it with static-codereview.wikimedia.org as addiitonal SAN ([[phab:T243056|T243056]])
* 19:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 13:32 hashar: Restarting CI Jenkins
* 19:09 dancy@deploy2002: Installation of scap version "4.47.1" completed for 587 hosts
* 13:27 mutante: puppetmaster - revoking cert for webserver-misc-static, not used anymore, merged into webserver-misc-apps
* 19:07 dancy@deploy2002: Installing scap version "4.47.1" for 587 hosts
* 13:27 moritzm: installing graphicsmagick security updates
* 19:04 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
* 13:26 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki2001 - [[phab:T252010|T252010]]
* 19:03 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag (duration: 00m 14s)
* 13:25 XioNoX: add routinator 3000 0.7.0 to buster-wikimedia - [[phab:T252010|T252010]]
* 19:03 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag
* 13:19 ema: cp: upgrade purged to v0.10
* 19:01 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
* 13:08 godog: start swift decom ms-be101[678] - [[phab:T252008|T252008]]
* 18:52 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1002.wikimedia.org with OS bullseye
* 11:22 kart_: EU SWAT done.
* 18:38 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 11:13 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}594668{{!}}Enable ContentTranslation in Armenian WP as a default tool (T249229)]] (duration: 01m 08s)
* 18:36 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 10:27 ema: cp2027: test purged v0.10
* 18:00 AndyRussG: update SmashPig config {{Gerrit|59a8b2d2}} -> {{Gerrit|6e651fd}}
* 10:20 moritzm: restarting apache on dbmonitor/grafana/miscweb/graphite/netmon to pick up openldap update
* 17:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dborch1002.wikimedia.org
* 10:00 moritzm: installing remaining openldap security updates (client-side libs, tools)
* 17:40 joal@deploy2002: Finished deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b] (duration: 00m 11s)
* 09:52 jbond42: enable rember me feature of CAS
* 17:39 joal@deploy2002: Started deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b]
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 and remove db1103:3314 from vslow in s4', diff saved to https://phabricator.wikimedia.org/P11159 and previous config saved to /var/cache/conftool/dbconfig/20200506-093940-marostegui.json
* 17:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-client1002.eqiad.wmnet
* 09:12 marostegui: Upgrade package on s3 and s7 master (db1123 and db1086) in preparation for tomorrow's restart - [[phab:T251158|T251158]]
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 08:56 jbond42: restarting ps1-a4-eqiad.mgmt.eqiad.wmnet.
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 08:53 jynus: kill FTWRL on db2101
* 16:53 mutante: sudo cumin -b 4 -s 40 'C:role::cache::text' 'run-puppet-agent'
* 08:43 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Reverting change on mw1407 [[phab:T99740|T99740]] (duration: 01m 16s)
* 16:50 jbond: copy /usr/bin/prometheus-ipmi-exporter from bullseye to buster
* 08:02 _joe_: restarted php-fpm with tweaked parameters on mw1407, now briefly pooling for traffic ([[phab:T99740|T99740]])
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 07:38 kormat@cumin1001: dbctl commit (dc=all): 'Set es1023 (es5 master) to 0 weight after reimaging es1024 [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11158 and previous config saved to /var/cache/conftool/dbconfig/20200506-073856-kormat.json
* 16:46 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 07:32 vgutierrez: downgrade to ATS 8.0.7-1wm3 on cp4026, cp4031, cp5006 and cp5011
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:00 elukey: powercycle analytics1060 - host stuck - [[phab:T251973|T251973]]
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1103:3314 in vslow on s4 while db1121 is out [[phab:T250055|T250055]]', diff saved to https://phabricator.wikimedia.org/P11157 and previous config saved to /var/cache/conftool/dbconfig/20200506-050340-marostegui.json
* 16:45 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
* 05:02 marostegui: Deploy schema change on db1121
* 16:43 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 16:43 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
* 16:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:28 jbond: upload prometheus-ipmi-exporter_1.6.1 to bullseye
* 16:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-test-client1002.eqiad.wmnet on all recursors
* 16:15 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-test-client1002.eqiad.wmnet on all recursors
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
* 16:13 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
* 16:10 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
* 16:10 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-client1002.eqiad.wmnet
* 15:57 jynus: running from cumin1001: transfer.py --type=decompress dbprov1003.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s5.2023-03-20--04-00-30.tar.gz db1145.eqiad.wmnet:/srv/sqldata.s5
* 15:53 jhathaway@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.wikimedia.org
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 15:53 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 15:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 15:52 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:51 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:42 jbond: stop puppet from deploying this further
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
* 15:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
* 15:26 samtar@deploy2002: Finished scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] (duration: 09m 11s)
* 15:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:19 samtar@deploy2002: samtar: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:17 samtar@deploy2002: Started scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]]
* 15:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:10 samtar@deploy2002: Finished scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] (duration: 09m 32s)
* 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 15:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:02 samtar@deploy2002: samtar: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:02 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 15:00 samtar@deploy2002: Started scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]]
* 14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 14:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 14:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=kartotherian,name=maps1005.eqiad.wmnet
* 14:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=maps1005.eqiad.wmnet
* 14:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 14:38 hnowlan: disabling puppet on maps* before merging 760619
* 14:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:27 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:17 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:14 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] (duration: 07m 53s)
* 14:10 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:02 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]]
* 14:00 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:58 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 13:40 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:33 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:28 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:25 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:21 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:16 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 13:05 elukey: move kafka mirror maker instances to PKI migration settings (new truststores) - [[phab:T319372|T319372]]
* 11:20 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 11:09 joal: Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00
* 11:08 joal: Kill mediacounts_load oozie job
* 11:07 joal: Unpause mediawiki_history_denormalize airflow job
* 11:06 joal: Kill mediawiki_denormalize oozie job
* 11:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s)
* 11:04 joal@deploy2002: Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b]
* 10:43 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:32 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:24 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s)
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9]
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s)
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9]
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s)
* 10:14 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9]
* 09:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
* 09:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
* 09:25 phedenskog@deploy2002: Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s)
* 09:25 phedenskog@deploy2002: Started deploy [performance/navtiming@d2b97ad]: (no justification provided)
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 08:31 elukey: move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - [[phab:T319372|T319372]]
* 06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150
* 06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150
* 03:57 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.26 (duration: 02m 18s)
* 03:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]] (duration: 52m 38s)
* 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]


== 2020-05-05 ==
== 2023-03-20 ==
* 23:44 catrope@deploy1001: Synchronized wmf-config/flaggedrevs.php: Restore the reviewer group on fawiki ([[phab:T249643|T249643]]) (duration: 01m 06s)
* 22:00 samtar@deploy2002: Finished scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] (duration: 09m 45s)
* 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3) (duration: 00m 11s)
* 21:52 samtar@deploy2002: jdlrobson and samtar: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 23:22 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3)
* 21:50 samtar@deploy2002: Started scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]]
* 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 14s)
* 21:34 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki shwiki --fix` [[phab:T332614|T332614]]
* 23:21 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
* 21:25 TheresNoTime: closing UTC late backport window, extended
* 23:21 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 20s)
* 21:22 samtar@deploy2002: Finished scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] (duration: 12m 22s)
* 23:20 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
* 21:11 samtar@deploy2002: samtar and aleksandar: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 22:00 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: [[phab:T251952|T251952]] take 2 (duration: 01m 06s)
* 21:10 samtar@deploy2002: Started scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]]
* 21:57 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: [[phab:T251952|T251952]] (duration: 01m 05s)
* 21:09 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer (duration: 00m 13s)
* 21:55 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/SpecialNewpages.php: [[phab:T251950|T251950]] (duration: 01m 06s)
* 21:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer
* 20:02 herron: added ryankemper to wmf and ops ldap groups [[phab:T251572|T251572]]
* 21:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] (duration: 08m 34s)
* 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 08s)
* 21:02 samtar@deploy2002: matmarex and samtar: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 19:38 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 21:00 TheresNoTime: extending UTC late backport window
* 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 25m 18s)
* 21:00 samtar@deploy2002: Started scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]]
* 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.31
* 20:58 kharlan@deploy2002: Finished scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] (duration: 10m 28s)
* 19:13 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 20:49 kharlan@deploy2002: kharlan: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmn
* 19:12 brennen@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.31 (duration: 97m 23s)
* 20:47 kharlan@deploy2002: Started scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]]
* 19:02 brennen: train status: 1.35.0-wmf.31: presently pressing enter through scap-cdb-rebuild; at 8% ([[phab:T249963|T249963]], [[phab:T223287|T223287]])
* 19:49 mutante: miscweb1003 - manually edit /srv/deployment/iegreview/iegreview-cache/.config and replace tin.eqiad.wmnet with deployment.eqiad.wmnet (which is an alias for deploy2002.codfw.wmnet) [[phab:T257317|T257317]] [[phab:T332623|T332623]] [[phab:T331896|T331896]]
* 18:39 cdanis: depool mw2221 for some manual testing
* 19:13 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator (duration: 00m 13s)
* 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 09s)
* 19:13 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator
* 18:35 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 18:56 ejegg: switched back to new PayPal pending transaction resolver
* 18:34 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 18m 54s)
* 18:48 akosiaris@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 28s)
* 18:15 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 18:47 akosiaris: emergency rollover of redis password complete
* 17:35 brennen@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.31
* 18:45 akosiaris: re-enable puppet on rdb*, netbox*, ores*, registry*
* 16:48 brennen: 1.35.0-wmf.31 was branched at {{Gerrit|4d3fed31a435e7bd24925a154f89a9407670986d}} for [[phab:T249963|T249963]]
* 18:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script (duration: 00m 13s)
* 16:34 brennen: triggering branch cut for 1.35.0-wmf.31 ([[phab:T249963|T249963]]) via https://releases-jenkins.wikimedia.org/job/MediaWiki%20Train%20Branch%20Cut/build?delay=0sec
* 18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script
* 16:18 brennen: notice: planning branch cut for 1.35.0-wmf.31 ([[phab:T249963|T249963]]) at 16:30 UTC
* 18:42 ejegg: civicrm upgraded from {{Gerrit|3d3606f1}} to {{Gerrit|09373b9d}}
* 15:47 cstone: SmashPig revision changed from {{Gerrit|8c30ed7fe5}} to {{Gerrit|cd1a49da5f}}
* 18:32 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 100% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11153 and previous config saved to /var/cache/conftool/dbconfig/20200505-153843-kormat.json
* 18:32 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 15:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 18:32 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:31 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 18:30 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 14:58 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb (duration: 01m 31s)
* 18:30 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 14:56 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb
* 18:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 14:45 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 18:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 14:43 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 18:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 14:32 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 75% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11149 and previous config saved to /var/cache/conftool/dbconfig/20200505-143158-kormat.json
* 18:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 13:46 akosiaris: deploy cxserver chart 0.0.15 to staging, codfw, eqiad. [[phab:T219921|T219921]]
* 18:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 13:45 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 18:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 13:41 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 18:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 13:41 hashar: Updated Jenkins job https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler to have it defined in JJB # [[phab:T97513|T97513]]
* 18:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 13:36 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 18:16 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 13:18 vgutierrez: upgrade ATS to version 8.1 () on cp4026, cp4032, cp5006 and cp5011
* 18:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 50% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11147 and previous config saved to /var/cache/conftool/dbconfig/20200505-131520-kormat.json
* 18:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 12:52 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 at 25% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11145 and previous config saved to /var/cache/conftool/dbconfig/20200505-125254-kormat.json
* 18:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 12:37 XioNoX: push pfw policy - [[phab:T251769|T251769]]
* 18:15 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 12:07 jbond42: updating cas login page
* 18:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 12:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 12:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 12:03 moritzm: rolling restart of apache on puppetboard* to pick up OpenLDAP update
* 18:11 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 11:47 moritzm: rolling restart of apache on kibana hosts
* 18:11 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 11:41 mutante: LDAP - added eamedia to wmf group ([[phab:T251358|T251358]])
* 18:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11144 and previous config saved to /var/cache/conftool/dbconfig/20200505-113152-marostegui.json
* 18:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11143 and previous config saved to /var/cache/conftool/dbconfig/20200505-113100-marostegui.json
* 18:05 mutante: miscweb1003 - syntax error in httpd config due to "Unknown Authn provider: ldap" - comes from static-rt vhost ([[phab:T331896|T331896]])
* 11:30 marostegui: Drop [[phab:T248086|T248086]]_wb_terms table on labsdb hosts - [[phab:T248086|T248086]]
* 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet
* 11:26 moritzm: rolling restart of apache/FPM on mw1261-mw1265
* 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet
* 11:22 kart_: EU SWAT done.
* 17:59 mutante: when applying apache role for the first time on new hosts we still have the same old conflict: miscweb1003 - manual "a2dismod mpm_event" to be able to let puppet enable mod PHP ([[phab:T196968|T196968]])
* 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}592479{{!}}Adjust ContentTranslation MT threshold for Chinese WP to 70% (T246383)]] (duration: 01m 01s)
* 17:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
* 11:01 moritzm: installing remaining openldap security updates (client-side libs, tools)
* 17:57 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
* 11:00 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1024 for reimaging, add es1023 (master) for reading in the meantime [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11141 and previous config saved to /var/cache/conftool/dbconfig/20200505-110031-kormat.json
* 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11140 and previous config saved to /var/cache/conftool/dbconfig/20200505-104540-marostegui.json
* 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11139 and previous config saved to /var/cache/conftool/dbconfig/20200505-104441-marostegui.json
* 17:26 akosiaris: disable puppet on rdb*, netbox*, ores*, registry*
* 10:33 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
* 10:23 arturo: copy prometheus-rabbitmq-exporter v0.4 from stretch-wikimedia to buster-wikimedia in apt1001 ([[phab:T251660|T251660]])
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
* 10:18 arturo: copy prometheus-pdns-exporter v0.5.1 from stretch-wikimedia to buster-wikimedia in apt1001 ([[phab:T251575|T251575]])
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 10:16 mutante: temp disabling puppet on all ganeti hosts to carefully deploy change related to rapi cert location
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 09:36 moritzm: removing boron.eqiad.wmnet
* 16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 09:36 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 16:36 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 09:03 gehel: restarting wdqs updater on all servers
* 16:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 08:53 moritzm: installing Java security updates on releases*
* 16:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 08:44 kormat: reimaging es1024 to buster [[phab:T250666|T250666]]
* 16:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 08:27 ema: cp2028 and cp2030 (both upload): varnish-fe restart to clear cache and evaluate 'exp' admission policy [[phab:T144187|T144187]] [[phab:T249809|T249809]]
* 16:21 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 08:26 moritzm: upgrading slapd on serpens/seaborgium
* 16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 08:19 ema: cp2027 and cp2029 (both text): varnish-fe restart to clear cache and evaluate 'exp' admission policy [[phab:T144187|T144187]] [[phab:T249809|T249809]]
* 15:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 08:08 moritzm: installing Java security updates on notebook/stat hosts
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:54 gehel@deploy1001: Finished deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22 (duration: 04m 18s)
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:50 gehel@deploy1001: Started deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22
* 14:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 07:36 zpapierski@deploy1001: Started deploy [wdqs/wdqs@d37a059]: fix for the duplicated jars
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 06:59 addshore: depool wdqs1006 heavy lag
* 14:53 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only=off for maintenance [[phab:T251154|T251154]]', diff saved to https://phabricator.wikimedia.org/P11133 and previous config saved to /var/cache/conftool/dbconfig/20200505-052334-marostegui.json
* 14:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only for maintenance [[phab:T251154|T251154]]', diff saved to https://phabricator.wikimedia.org/P11132 and previous config saved to /var/cache/conftool/dbconfig/20200505-052058-marostegui.json
* 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2552
* 05:19 marostegui: Start s5 and s6 maintenance - [[phab:T251154|T251154]]
* 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2552
* 04:39 marostegui: Restart mysql on tendril host: db1115 - [[phab:T231769|T231769]]
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 and promote es2027 to es3 master', diff saved to https://phabricator.wikimedia.org/P45896 and previous config saved to /var/cache/conftool/dbconfig/20230320-143951-root.json
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]]
* 14:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]]
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:17 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:11 TheresNoTime: close UTC afternoon backport window
* 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
* 14:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
* 14:08 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autopatrol' 'autopatrolled'` [[phab:T331762|T331762]]
* 14:06 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 14:05 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreview' 'autopatrol'` [[phab:T331762|T331762]]
* 14:03 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki slwiki --fix` [[phab:T332351|T332351]]
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'reviewer' 'patrol'` [[phab:T331762|T331762]]
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreviewer' 'autopatrol'` ("nothing to do") [[phab:T331762|T331762]]
* 14:00 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki ptwikisource editor` [[phab:T331762|T331762]]
* 13:58 samtar@deploy2002: Finished scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] (duration: 09m 44s)
* 13:50 samtar@deploy2002: thiemowmde and samtar and zoranzoki21: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:49 samtar@deploy2002: Started scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]]
* 13:47 samtar@deploy2002: Finished scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] (duration: 09m 26s)
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host cuminunpriv1001.eqiad.wmnet with OS bullseye
* 13:39 samtar@deploy2002: aleksandar and samtar: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:38 samtar@deploy2002: Started scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]]
* 13:37 samtar@deploy2002: Finished scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] (duration: 08m 46s)
* 13:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
* 13:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
* 13:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
* 13:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
* 13:30 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 30s)
* 13:29 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
* 13:29 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}}
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]]
* 13:28 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:26 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 39s)
* 13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
* 13:24 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}}
* 13:18 samtar@deploy2002: Finished scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] (duration: 11m 36s)
* 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host cuminunpriv1001.eqiad.wmnet with OS bullseye
* 13:17 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 13:17 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 13:14 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:14 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 13:08 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:06 samtar@deploy2002: Started scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]]
* 11:35 krinkle@deploy2002: Synchronized php-1.40.0-wmf.27/includes/libs/rdbms/: (no justification provided) (duration: 15m 28s)
* 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
* 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082
* 09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141082
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58655
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58655
* 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2552
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2552
* 09:21 claime: Repooling parse2004 - [[phab:T332119|T332119]]
* 08:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 138915
* 08:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 138915
* 08:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138915
* 08:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138915


== 2020-05-04 ==
== 2023-03-19 ==
* 23:38 mstyles@deploy1001: Finished deploy [wdqs/wdqs@6518a8d]: v.0.3.26 (duration: 14m 39s)
* 18:27 AndyRussG: update config (to re-enable old PayPal orphan slayer job) {{Gerrit|27a5b481}} -> {{Gerrit|6359222d}}
* 23:37 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Use namespaced EventBus classes (duration: 00m 57s)
* 16:44 apergos: dumpsdata1005 conversion to primary dumps nfs server done
* 23:35 reedy@deploy1001: Synchronized wmf-config/logging.php: Use namespaced EventBus classes (duration: 00m 56s)
* 15:12 AndyRussG: update config (to disable paypal_ec pending transaction resolver) {{Gerrit|5dd37c9c}} -> {{Gerrit|3d3606f1}}
* 23:33 reedy@deploy1001: Synchronized rpc/RunSingleJob.php: Use namespaced EventBus classes (duration: 00m 58s)
* 14:18 apergos: work starting now to swap dumpsdata1005 in for primary nfs server, replacing dumpsdata1003 which will become dumps spare host
* 23:29 reedy@deploy1001: Synchronized wmf-config/logging.php: Replace AuthManagerStatsdHandler with WikimediaEventsAuthManagerStatsdHandler::class (duration: 00m 57s)
* 00:17 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
* 23:23 mstyles@deploy1001: Started deploy [wdqs/wdqs@6518a8d]: v.0.3.26
* 00:17 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 22:42 sbassett@deploy1001: Synchronized private/PrivateSettings.php: [[phab:T251835|T251835]]: Restore {{Gerrit|dc752af1e94684faacbe9662789815c6edbbdf46}} (duration: 00m 57s)
* 22:16 eileen: process-control config revision is {{Gerrit|2eb75f8dff}}
* 22:06 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 21:45 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Revert partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 21:41 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deploy partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - [[phab:T249822|T249822]], [[phab:T238086|T238086]] (duration: 00m 05s)
* 18:19 dpifke@deploy1001: Started deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - [[phab:T249822|T249822]], [[phab:T238086|T238086]]
* 18:16 Urbanecm: Morning SWAT done
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c04fbdd}}: Adding upload_by_url user right to all registered users on Commons ([[phab:T251474|T251474]]) (duration: 00m 57s)
* 18:11 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/DiscussionTools/includes/DiscussionToolsHooks.php: SWAT: {{Gerrit|b85fc16}}: Enable on all ExtraSignaturesNamespaces ([[phab:T249036|T249036]]) (duration: 01m 00s)
* 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|18c1efb}}: Load DiscussionTools on en.wiki ([[phab:T249376|T249376]]) (duration: 00m 58s)
* 17:57 XioNoX: configure singtel interface on cr1-eqsin
* 17:36 volans: upgraded spicerack on cumin[12]001 to 0.0.33-1
* 17:02 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [{{Gerrit|2252f9a}}] (duration: 00m 09s)
* 17:02 joal@deploy1001: Started deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [{{Gerrit|2252f9a}}]
* 17:01 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [{{Gerrit|2252f9a}}] (duration: 16m 45s)
* 16:44 joal@deploy1001: Started deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [{{Gerrit|2252f9a}}]
* 16:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.30
* 15:59 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.30 (duration: 01m 05s)
* 15:58 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.30
* 15:53 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 15:53 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
* 15:53 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 15:52 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 15:52 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 15:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2025 after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11128 and previous config saved to /var/cache/conftool/dbconfig/20200504-154747-kormat.json
* 15:45 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/includes/libs/rdbms/database/DatabaseMysqlBase.php: [[phab:T251457|T251457]] rdbms: don't treat lock() as a write operation (duration: 01m 04s)
* 15:43 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/resources/src/mediawiki.diff.styles/diff.less: [[phab:T250393|T250393]] Follow-up {{Gerrit|I07dd6f7}}: Fix font size in diff (duration: 01m 05s)
* 15:34 volans: uploaded spicerack_0.0.33-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 15:26 volans: deploy1001: deleted old .hhvm.hhbc files (/home/*/.hhvm.hhbc) https://phabricator.wikimedia.org/P11127
* 15:23 volans: deploy1001: deleted old .hhvm.hhbc files moved from tin (/home/*/home-tin/.hhvm.hhbc) https://phabricator.wikimedia.org/P11126
* 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 fully after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11125 and previous config saved to /var/cache/conftool/dbconfig/20200504-151243-kormat.json
* 15:11 ppchelko@deploy1001: Finished deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints (duration: 14m 36s)
* 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [{{Gerrit|3396279}}] (duration: 00m 10s)
* 15:05 joal@deploy1001: Started deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [{{Gerrit|3396279}}]
* 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [{{Gerrit|3396279}}] (duration: 15m 07s)
* 15:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:57 ppchelko@deploy1001: Started deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints
* 14:50 joal@deploy1001: Started deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [{{Gerrit|3396279}}]
* 14:19 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 fully and db1101:3318 to 75% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11123 and previous config saved to /var/cache/conftool/dbconfig/20200504-141919-kormat.json
* 14:15 XioNoX: add static nat for fran1001 - [[phab:T251763|T251763]]
* 13:50 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2025 for reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11122 and previous config saved to /var/cache/conftool/dbconfig/20200504-135039-kormat.json
* 13:34 kormat: reimaging es2025 to buster [[phab:T250666|T250666]]
* 13:27 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 some more after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11121 and previous config saved to /var/cache/conftool/dbconfig/20200504-132744-kormat.json
* 13:02 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T248664|T248664]] Stop setting legacy wmgWikibase(Repo/Client)Repositories for TEST wikis (duration: 01m 06s)
* 12:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11120 and previous config saved to /var/cache/conftool/dbconfig/20200504-124659-kormat.json
* 12:10 marostegui: Temporary enable slow query log on db1099:3311 - [[phab:T206103|T206103]]
* 12:09 Amir1: EU SWAT is done
* 11:53 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:592761{{!}}Increase wmgMemoryLimit from 660MB to 666MB]] (duration: 01m 06s)
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 [[phab:T206103|T206103]] after removing tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11119 and previous config saved to /var/cache/conftool/dbconfig/20200504-114727-marostegui.json
* 11:46 tgr@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594134{{!}}Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 06s)
* 11:46 marostegui: Remove index tmp_2 from recentchanges on db1099:3311 [[phab:T206103|T206103]]
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 [[phab:T206103|T206103]] to remove tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11118 and previous config saved to /var/cache/conftool/dbconfig/20200504-114539-marostegui.json
* 11:43 tgr@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594137{{!}}Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 10s)
* 11:38 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
* 11:30 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4d00236}}: Enable cross-project search on frwikibooks ([[phab:T251683|T251683]]) (duration: 01m 05s)
* 11:25 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/elwikiversity*.png ([[phab:T251050|T251050]])
* 11:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|64556ba}}: Correct typo in Greek Wikiversity logo ([[phab:T248391|T248391]]) (duration: 01m 06s)
* 11:20 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/jvwiki*.png ([[phab:T251050|T251050]])
* 11:20 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|3b8c618}}: Update jvwiki logos ([[phab:T251050|T251050]]) (duration: 01m 05s)
* 11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|cc94ea7}}: Enable VisualEditor for more namespaces on vecwiki ([[phab:T250419|T250419]]) (duration: 01m 07s)
* 10:49 arturo: update packages in buster-wikimedia {{!}} thirdparty/kubead-k8s-1-15 and thirdparty/kubeadm-k8s-1-16 ([[phab:T250866|T250866]])
* 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:594128{{!}} Bumping portals to master (563985)]] (duration: 01m 05s)
* 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:594128{{!}} Bumping portals to master (563985)]] (duration: 01m 29s)
* 10:39 vgutierrez: rolling upgrade of ATS to version 8.0.7-1wm3
* 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:30 arturo: running `aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished` to cleanup buster-wikimedia{{!}}thirdparty/kubeadm-k8s ([[phab:T250866|T250866]])
* 09:46 vgutierrez: upload trafficserver 8.0.7-1wm2 to apt.wm.o (buster)
* 09:22 kormat: reimaging db1101 to buster [[phab:T250666|T250666]]
* 08:50 XioNoX: configure BGP peering with AS132203
* 08:20 godog: add 50G to prometheus-ops on prometheus100[34]
* 08:17 marostegui: Deploy schema change on s5 codfw - [[phab:T251188|T251188]]
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for reimage', diff saved to https://phabricator.wikimedia.org/P11113 and previous config saved to /var/cache/conftool/dbconfig/20200504-075148-marostegui.json
* 07:31 marostegui: Drop unused flagged* tables from mediawikiwiki - [[phab:T248298|T248298]]
* 07:26 moritzm: removed jmorgan from cn=wmf
* 07:24 marostegui: Install 10.1.43-2 on s5 (db110) and s6 (db1131) masters in preparations for tomorrow's restart - [[phab:T251154|T251154]]
* 07:24 moritzm: removed Kerberos principal for lexnasser and jmorgan
* 07:23 moritzm: removed lexnasser from cn=nda
* 07:07 elukey: execute ifdown eno1; ifup eno1 on analytics1052 - interface neg speed flapping
* 06:41 elukey: upload prometheus-druid-exporter 0.8-1 to stretch-wikimedia


== 2020-05-03 ==
== 2023-03-18 ==
* 22:52 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/593929
* 22:47 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
* 22:42 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/591459
* 22:47 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 21:37 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service (duration: 04m 22s)
* 14:26 apergos: rsync of xmldata public dir  from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
* 21:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service
* 13:46 apergos: rsync of xmldata private dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
* 07:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 07:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 02:57 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
* 02:57 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 01:21 urandom: powercycling restbase2025 — [[phab:T332462|T332462]]
* 00:06 AndyRussG: Updating civicrm from {{Gerrit|5dd37c9c}} to {{Gerrit|3d3606f1}}


== 2020-05-02 ==
== 2023-03-17 ==
* 07:49 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(49{{!}}5[0-9]{{!}}6[0-2])\.eqiad\.wmnet
* 19:53 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching (duration: 00m 13s)
* 07:08 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 0 member 1
* 19:53 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching
* 02:36 volker-e@deploy1001: Finished deploy [design/style-guide@f0d467b]: Deploy design/style-guide: (duration: 00m 07s)
* 19:52 bd808: Testing Mastodon account changes. This should post to @wikimedia_sal@botsin.space
* 02:36 volker-e@deploy1001: Started deploy [design/style-guide@f0d467b]: Deploy design/style-guide:
* 19:06 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch (duration: 00m 13s)
* 19:06 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch
* 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
* 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
* 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
* 18:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
* 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
* 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
* 18:10 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
* 18:09 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
* 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
* 17:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
* 17:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
* 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5004.eqsin.wmnet
* 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs5004.eqsin.wmnet
* 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
* 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
* 15:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:29 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 15:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 14:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 14:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 14:54 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 14:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:13 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 14:05 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 13:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2004.codfw.wmnet
* 13:21 claime: Depooling parse2004.codfw.wmnet for broken PSU - [[phab:T332119|T332119]]
* 12:06 mutante: systemct-reset failed on gitlab-runner*
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 11:03 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:02 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl', diff saved to https://phabricator.wikimedia.org/P45887 and previous config saved to /var/cache/conftool/dbconfig/20230317-055643-marostegui.json
* 02:10 ejegg: civicrm upgraded from {{Gerrit|672950d9}} to {{Gerrit|5dd37c9c}}
* 01:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2010.codfw.wmnet
* 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2010.codfw.wmnet
* 00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
* 00:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
* 00:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
* 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
* 00:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates
* 00:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates


== 2020-05-01 ==
== 2023-03-16 ==
* 19:56 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 23:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
* 18:57 gehel: restart blazegraph on wdqs1006 - [[phab:T242453|T242453]]
* 23:40 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11110 and previous config saved to /var/cache/conftool/dbconfig/20200501-142354-marostegui.json
* 23:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
* 14:18 hknust: holger@mwmaint1002 finished renameInvalidUsernames.php (fail) as part of [[phab:T219279|T219279]]
* 23:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11109 and previous config saved to /var/cache/conftool/dbconfig/20200501-140603-marostegui.json
* 23:31 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb2003.codfw.wmnet with OS bullseye
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11108 and previous config saved to /var/cache/conftool/dbconfig/20200501-134707-marostegui.json
* 23:28 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb1003.eqiad.wmnet with OS bullseye
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly warm up db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11107 and previous config saved to /var/cache/conftool/dbconfig/20200501-132804-marostegui.json
* 23:20 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0 (duration: 00m 19s)
* 13:06 hknust: holger@mwmaint1002 Starting renameInvalidUsernames.php as part of [[phab:T219279|T219279]]
* 23:20 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0
* 13:01 vgutierrez: rolling restart of ats-tls in text@esams - [[phab:T249335|T249335]]
* 23:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
* 12:24 mutante: mw230* - rolling restart of php-fpm - icinga warnings about opcache health in codfw
* 23:15 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
* 12:20 mutante: mw2376 - restarting php-fpm - icinga warnings about opcache health in codfw
* 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
* 12:07 mutante: notebook1004 - puppet was failed due to removal of jmorgan while one of his processes was still running. "change to absent failed.. user jmorgan currently used by process 29038". killing 29038, running puppet [[phab:T251560|T251560]]
* 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
* 12:05 mutante: notebook1003 - puppet was failed due to removal of jmorgan while one of his processeswas still running. "change to absent failed.. user jmorgan currently used by porcess 3288". killing 3288, running puppet [[phab:T251560|T251560]]
* 23:01 dzahn@cumin1001: START - Cookbook sre.ganeti.reimage for host miscweb1003.eqiad.wmnet with OS bullseye
* 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:00 dzahn@cumin2002: START - Cookbook sre.ganeti.reimage for host miscweb2003.codfw.wmnet with OS bullseye
* 11:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb1003.eqiad.wmnet
* 11:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb2003.codfw.wmnet
* 11:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb1003.eqiad.wmnet on all recursors
* 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:39 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache miscweb1003.eqiad.wmnet on all recursors
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
* 08:54 _joe_: depooled all servers in the app pool in rack D1
* 22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 08:54 oblivian@cumin1001: conftool action : set/pooled=no:weight=30; selector: name=mw13(49{{!}}5[0-5])\.eqiad\.wmnet
* 22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host miscweb1003.eqiad.wmnet
* 08:50 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw13(49{{!}}5[0-5])\.eqiad\.wmnet
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb2003.codfw.wmnet on all recursors
* 08:48 _joe_: repooling mw1407 with LCStoreStaticArray, increased opcache, puppet disabled
* 22:32 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache miscweb2003.codfw.wmnet on all recursors
* 08:45 _joe_: repooling mw1409
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:39 _joe_: repool mw1352
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
* 08:37 _joe_: depooling mw1352
* 22:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
* 07:44 marostegui: Copy wikireplica dump from labsdb1009 to labsdb1011 - [[phab:T249188|T249188]]
* 22:29 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 01:36 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service (duration: 04m 33s)
* 22:29 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host miscweb2003.codfw.wmnet
* 01:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service
* 22:24 ejegg: civicrm upgraded from {{Gerrit|68fa85cf}} to {{Gerrit|672950d9}}
* 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 22:04 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 21:54 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 20:47 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 20:36 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): blockers hopefully resolved, rolling to all wikis
* 20:35 TheresNoTime: close UTC late backport window
* 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] (duration: 08m 18s)
* 20:28 samtar@deploy2002: samtar and sharvaniharan: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 20:26 samtar@deploy2002: Started scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]]
* 20:21 brennen@deploy2002: Finished scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] (duration: 09m 06s)
* 20:14 brennen@deploy2002: brennen and jforrester: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:12 brennen@deploy2002: Started scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]]
* 19:28 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a587106]: (no justification provided) (duration: 00m 12s)
* 19:27 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a587106]: (no justification provided)
* 18:41 wfan: enable monthlyconvert for cz
* 18:40 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) (duration: 00m 13s)
* 18:40 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided)
* 18:38 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2067.codfw.wmnet
* 18:37 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 18:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet
* 18:03 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet
* 17:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 17:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
* 17:40 ayounsi@cumin2002: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
* 17:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 17:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s)
* 16:58 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade.
* 16:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
* 16:56 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
* 16:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:31 Emperor: reboot ms-be2067 again to see if the missing drive comes back
* 16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
* 15:39 claime: Pooled new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]]
* 15:28 sukhe: enable puppet on R:class = dnsrecursor to merge CR: 898957 [done]
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
* 15:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
* 15:15 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
* 15:15 claime: Pooling new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]]
* 15:13 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
* 15:12 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
* 15:10 sukhe: disable puppet on R:class = dnsrecursor to merge CR: 898957
* 15:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts
* 15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 32 hosts
* 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 14:49 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 14:44 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:06 urandom: ALTER-ing image_suggestions.suggestion table — [[phab:T328670|T328670]]
* 13:35 kostajh: UTC afternoon deploys done
* 13:34 kharlan@deploy2002: Finished scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] (duration: 07m 44s)
* 13:28 kharlan@deploy2002: kharlan: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]]
* 13:15 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] (duration: 09m 48s)
* 13:07 kharlan@deploy2002: kharlan: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:05 kharlan@deploy2002: Started scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]]
* 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
* 12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
* 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 12:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
* 11:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
* 11:43 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
* 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
* 11:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
* 11:27 hnowlan@puppetmaster1001: conftool action : set/weight=3; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 32 hosts with reason: new_install
* 11:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 32 hosts with reason: new_install
* 11:10 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
* 11:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
* 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
* 10:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:38 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
* 10:33 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 10:32 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 10:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
* 10:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:30 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:28 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 to move it to x1', diff saved to https://phabricator.wikimedia.org/P45885 and previous config saved to /var/cache/conftool/dbconfig/20230316-100945-root.json
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1105.eqiad.wmnet
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:49 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:48 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1105.eqiad.wmnet
* 08:40 kostajh: UTC morning deploys (second round) done
* 08:40 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] (duration: 12m 30s)
* 08:29 kharlan@deploy2002: kharlan: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]]
* 08:11 apergos: additional deployments for the  UTC morning backport and config training window, running into the next hour, so window re-opened
* 07:36 tgr_: UTC morning deploys done
* 07:34 tgr@deploy2002: Finished scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] (duration: 08m 13s)
* 07:28 tgr@deploy2002: tgr: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:26 tgr@deploy2002: Started scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]]
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105 from dbctl [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45883 and previous config saved to /var/cache/conftool/dbconfig/20230316-062307-root.json
* 06:03 marostegui: Failover m5 from db1106 to db1176 - [[phab:T332155|T332155]]
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]]
* 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]]
* 03:29 ejegg: payments-wiki upgraded from {{Gerrit|1532b107}} to {{Gerrit|0fd66b1f}}


==Archives==
== 2023-03-15 ==
See [[Server admin log/Archives]].
* 22:55 tzatziki: Removing 1 file for legal compliance
* 22:30 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 55s)
* 22:29 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]])
* 22:29 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 28s)
* 22:28 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]])
* 22:08 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str (duration: 00m 14s)
* 22:07 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str
* 21:59 brennen: end of phabricator update window ([[phab:T331915|T331915]])
* 21:47 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 40s)
* 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]])
* 21:46 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 28s)
* 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]])
* 21:26 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]]) (duration: 00m 52s)
* 21:25 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]])
* 21:19 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893] (duration: 00m 11s)
* 21:19 milimetric@deploy2002: Started deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893]
* 21:13 mutante: phab* - upgrading PHP packages
* 21:13 mutante: phabricator - maintenance window starting - expect possible downtime
* 21:08 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
* 21:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
* 20:56 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]]) (duration: 00m 31s)
* 20:55 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]])
* 20:54 brennen: starting phabricator window a touch early with a test deploy to phab2002
* 20:51 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor (duration: 00m 16s)
* 20:51 ebernhardson@deploy2002: Started deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor
* 20:48 TheresNoTime: close UTC late backport window
* 20:48 samtar@deploy2002: Finished scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] (duration: 08m 46s)
* 20:41 samtar@deploy2002: matmarex and samtar and esanders: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:39 samtar@deploy2002: Started scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]]
* 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] (duration: 10m 30s)
* 20:33 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3002.wikimedia.org with OS bullseye
* 20:27 samtar@deploy2002: samtar and tsepothoabala: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:25 samtar@deploy2002: Started scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]]
* 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] (duration: 10m 12s)
* 20:20 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bullseye
* 20:17 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bullseye
* 20:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
* 20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye
* 20:15 samtar@deploy2002: sgimeno and samtar: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]]
* 20:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
* 20:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 14s)
* 20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries
* 20:11 taavi: deploy patch for [[phab:T331192|T331192]]
* 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
* 20:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
* 20:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
* 19:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
* 19:54 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3002.wikimedia.org with OS bullseye
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
* 19:53 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3001.wikimedia.org with OS bullseye
* 19:50 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
* 19:49 taavi@deploy2002: Finished scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] (duration: 12m 04s)
* 19:48 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1002.wikimedia.org with OS bullseye
* 19:47 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
* 19:46 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bullseye
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
* 19:45 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2002.wikimedia.org with OS bullseye
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bullseye
* 19:41 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bullseye
* 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
* 19:39 taavi@deploy2002: taavi: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:37 taavi@deploy2002: Started scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]]
* 19:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
* 19:35 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
* 19:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
* 19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
* 19:32 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye
* 19:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
* 19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
* 19:28 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
* 19:27 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
* 19:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
* 19:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
* 19:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1001.wikimedia.org with OS bullseye
* 19:16 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2001.wikimedia.org with OS bullseye
* 19:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bullseye
* 19:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3001.wikimedia.org with OS bullseye
* 19:05 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6002.wikimedia.org with OS bullseye
* 19:03 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bullseye
* 18:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
* 18:49 mutante: adding new language prefix anp.wikipedia.org - Angika, an Eastern Indo-Aryan language spoken in some parts of the Indian states of Bihar and Jharkhand, as well as in parts of Nepal. ([[phab:T332115|T332115]])
* 18:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
* 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
* 18:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
* 18:25 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6001.wikimedia.org with OS bullseye
* 18:24 brennen@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]] (duration: 06m 08s)
* 18:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 18:19 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5002.wikimedia.org with OS bullseye
* 18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 18:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 05s)
* 18:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries
* 18:06 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): no current blockers, rolling to group1.
* 18:04 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5001.wikimedia.org with OS bullseye
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
* 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
* 17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
* 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.wmnet
* 17:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2006.codfw.wmnet
* 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bullseye
* 17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2006.codfw.wmnet
* 17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2004.codfw.wmnet
* 17:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2004.codfw.wmnet
* 17:29 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
* 17:25 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
* 17:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
* 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
* 17:12 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5001.wikimedia.org with OS bullseye
* 17:05 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4001.wikimedia.org with OS bullseye
* 16:19 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 16:19 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 16:17 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 16:17 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 16:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye
* 16:02 hnowlan: restarted thumbor-instances on thumbor1006
* 16:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 15:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 15:52 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
* 15:49 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
* 15:44 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4002.wikimedia.org with OS bullseye
* 15:34 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye
* 15:33 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 15:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 15:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 15:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:01 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 14:54 Emperor: depool moss-fe1001 as rate of token denial is too high
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 14:53 claime: Redeploying mw-on-k8s for php7.4 update [[phab:T330270|T330270]]
* 14:52 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 14:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:46 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:41 cgoubert@deploy2002: Started scap: (no justification provided)
* 14:41 claime: Rebuilding mw-on-k8s images - [[phab:T330270|T330270]]
* 14:38 claime: Updating php7.4 production images
* 14:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:34 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
* 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
* 14:24 daniel@deploy2002: Finished scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] (duration: 09m 57s)
* 14:22 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 14:22 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 14:22 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=pki
* 14:22 jbond: switch pki to be active active
* 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 14:20 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 14:19 jbond: update pki to use discovery record
* 14:16 jbond@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=pki
* 14:15 daniel@deploy2002: daniel: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:14 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4002.wikimedia.org with OS bullseye
* 14:14 daniel@deploy2002: Started scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]]
* 14:12 sukhe: [correction] depool _doh4002_ for reimaging to bullseye: [[phab:T321309|T321309]]
* 14:12 sukhe: depool dns4002 for reimaging to bullseye: [[phab:T321309|T321309]]
* 14:00 moritzm: nodejs security updates on buster
* 13:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS bullseye
* 13:50 sukhe: reprepro -C component/pdns-recursor include bullseye-wikimedia pdns-recursor_4.6.2-1+wmf11u1_amd64.changes: [[phab:T321309|T321309]]
* 13:49 moritzm: installing graphite-web security updates
* 13:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
* 13:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 13:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:28 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 13:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:27 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 13:27 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 13:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
* 13:26 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:24 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 13:21 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:20 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:17 taavi@deploy2002: Finished scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] (duration: 09m 01s)
* 13:12 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS bullseye
* 13:10 taavi@deploy2002: matmarex and taavi and esanders: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebu
* 13:08 taavi@deploy2002: Started scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]]
* 13:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 13:07 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:18 marostegui: Failover m5 from db1176 to db1106 - [[phab:T331877|T331877]]
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]]
* 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]]
* 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 11:36 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 11:34 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 11:32 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 11:30 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
* 11:27 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 11:26 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
* 11:20 moritzm: imported packages into thirdparty/ceph-quincy
* 11:16 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
* 11:14 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
* 11:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
* 11:00 claime: Redirecting test.wikidata.org to mw-on-k8s - [[phab:T331268|T331268]]/25
* 10:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 10:29 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 10:28 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 10:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 10:25 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 10:24 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 10:23 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 10:22 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 10:22 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:21 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:20 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:19 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 10:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 10:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 10:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 10:08 jayme@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 10:08 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 09:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 09:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 09:49 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 09:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
* 09:45 jayme@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
* 09:39 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
* 09:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 09:26 moritzm: rolling restart of FPM/Apache to pick up gnutls28 security updates
* 09:22 moritzm: installing gnutls28 security updates
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl [[phab:T331875|T331875]]', diff saved to https://phabricator.wikimedia.org/P45872 and previous config saved to /var/cache/conftool/dbconfig/20230315-090515-root.json
* 08:40 hashar@deploy2002: Finished deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]] (duration: 00m 19s)
* 08:40 hashar@deploy2002: Started deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]]
* 08:15 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 08:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 07:40 tgr_: UTC morning deploys done
* 07:39 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2067.codfw.wmnet
* 07:36 tgr@deploy2002: Finished scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] (duration: 07m 54s)
* 07:30 tgr@deploy2002: tgr: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:28 tgr@deploy2002: Started scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]]
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45870 and previous config saved to /var/cache/conftool/dbconfig/20230315-062643-root.json
* 06:20 marostegui: Remove pki2001 from m1 grants [[phab:T332018|T332018]]
 
== 2023-03-14 ==
* 23:29 brennen@deploy2002: Finished scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] (duration: 10m 32s)
* 23:20 brennen@deploy2002: brennen and umherirrender: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 23:19 brennen@deploy2002: Started scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]]
* 22:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 22:08 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 21:38 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 21:38 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 21:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 21:17 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 21:16 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 21:11 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 21:11 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 21:11 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 20:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 20:47 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 20:43 ejegg: payments-wiki upgraded from {{Gerrit|61c30a4f}} to {{Gerrit|1532b107}}
* 20:35 zabe@deploy2002: Finished scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] (duration: 08m 36s)
* 20:28 zabe@deploy2002: zabe: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:27 zabe@deploy2002: Started scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]]
* 20:04 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 20:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 19:47 topranks: Reboot cloudsw1-b1-codfw to upgrade JunOS version [[phab:T327919|T327919]]
* 19:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
* 19:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
* 19:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 19:30 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): uneventful at group0.  i'm afk for about an hour.
* 19:13 ejegg: civicrm upgraded from {{Gerrit|dbe3b716}} to {{Gerrit|68fa85cf}}
* 18:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS bullseye
* 18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
* 18:28 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s)
* 18:27 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 18:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
* 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 18:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 18:22 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 30s)
* 18:22 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 18:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 18:13 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 18:13 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS bullseye
* 18:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 18:06 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 18:03 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): no current blockers, rolling to group0.
* 17:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 17:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 17:58 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:56 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:56 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:55 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:53 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 17:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 16:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
* 16:47 sukhe: rolling restart of pdns-rec in A:wikidough to pick up config changes
* 16:47 sukhe: rolling restart of pdns-rec to pick up config changes
* 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pki2001.codfw.wmnet
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
* 16:13 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
* 16:11 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 16:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
* 16:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
* 16:00 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts pki2001.codfw.wmnet
* 15:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS bullseye
* 15:36 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
* 15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 15:32 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
* 15:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
* 15:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
* 15:19 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS bullseye
* 15:00 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:59 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 14:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 14:54 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 14:53 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 14:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 14:52 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 14:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:51 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:42 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1001.eqiad.wmnet with OS bullseye
* 14:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
* 14:16 claime: All active/active services in eqiad repooled, DNS issues resolved - [[phab:T331541|T331541]]
* 14:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2122 weight', diff saved to https://phabricator.wikimedia.org/P45866 and previous config saved to /var/cache/conftool/dbconfig/20230314-140926-root.json
* 14:01 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki1001.eqiad.wmnet with OS bullseye
* 14:00 jbond: reimage pki1001
* 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 13:33 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results (again, with sukhe's more-correct variant!)
* 13:27 TheresNoTime: close UTC afternoon backport window
* 13:26 samtar@deploy2002: Finished scap: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]] (duration: 07m 24s)
* 13:20 samtar@deploy2002: samtar and urbanecm: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:19 samtar@deploy2002: Started scap: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]]
* 13:18 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results
* 13:18 samtar@deploy2002: Finished scap: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]] (duration: 07m 55s)
* 13:11 samtar@deploy2002: esanders and samtar: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 13:10 samtar@deploy2002: Started scap: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]]
* 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:04 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 13:02 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 12:44 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 12:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45864 and previous config saved to /var/cache/conftool/dbconfig/20230314-123515-marostegui.json
* 12:23 moritzm: installing git security updates
* 12:20 samtar@deploy2002: Finished scap: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]] (duration: 09m 12s)
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45863 and previous config saved to /var/cache/conftool/dbconfig/20230314-122009-marostegui.json
* 12:20 TheresNoTime: `Command '['helmfile', '-e', 'eqiad', '--selector', 'name=canary', 'apply']' returned non-zero exit status 1.` (P45862) during scap deployment of [[phab:T297396|T297396]] + [[phab:T331680|T331680]] — scap rolled back
* 12:18 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host pki-root1001.eqiad.wmnet with OS bullseye
* 12:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 12:13 samtar@deploy2002: samtar and varnent: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 12:11 samtar@deploy2002: Started scap: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]]
* 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
* 12:08 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
* 12:08 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 12:06 claime: Unlocked scap deployments - [[phab:T331541|T331541]]
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45861 and previous config saved to /var/cache/conftool/dbconfig/20230314-120503-marostegui.json
* 12:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 12:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
* 11:51 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
* 11:51 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45860 and previous config saved to /var/cache/conftool/dbconfig/20230314-114957-marostegui.json
* 11:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 11:41 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 11:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 11:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 11:27 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 11:27 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45857 and previous config saved to /var/cache/conftool/dbconfig/20230314-112354-marostegui.json
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45856 and previous config saved to /var/cache/conftool/dbconfig/20230314-112333-marostegui.json
* 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
* 11:19 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
* 11:13 claime: We are encountering unexpected DNS anycast issued following [[phab:T331541|T331541]], latencies are increased but no production outage.
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45855 and previous config saved to /var/cache/conftool/dbconfig/20230314-110826-marostegui.json
* 11:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
* 11:03 akosiaris@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
* 11:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
* 10:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45854 and previous config saved to /var/cache/conftool/dbconfig/20230314-105319-marostegui.json
* 10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: [[phab:T331541|T331541]]
* 10:48 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: [[phab:T331541|T331541]]
* 10:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - [[phab:T331541|T331541]]
* 10:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki-root1001.eqiad.wmnet with OS bullseye
* 10:42 jbond: reimage pki-root1001
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45853 and previous config saved to /var/cache/conftool/dbconfig/20230314-103813-marostegui.json
* 10:33 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - [[phab:T331541|T331541]]
* 10:32 claime: Repooling all active/active services in eqiad - [[phab:T331541|T331541]]
* 10:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
* 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 10:28 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
* 10:28 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=99)
* 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
* 10:28 claime: Running sre.switchdc.mediawiki.00-optional-warmup-caches - [[phab:T331541|T331541]]
* 10:21 jbond: move pki.discovery.wmnet to pki2002 (buyllseye)
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45852 and previous config saved to /var/cache/conftool/dbconfig/20230314-101918-marostegui.json
* 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45851 and previous config saved to /var/cache/conftool/dbconfig/20230314-101840-marostegui.json
* 10:15 jayme: enabling puppet on P:calico::kubernetes for [[phab:T325268|T325268]]
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45850 and previous config saved to /var/cache/conftool/dbconfig/20230314-100334-marostegui.json
* 10:02 claime: Locking scap deployment for service switchover - [[phab:T331541|T331541]]
* 10:00 claime: Locking scap deployment for service switchover - [[phab:T330651|T330651]]
* 09:56 jayme: disabling puppet on P:calico::kubernetes for [[phab:T325268|T325268]]
* 09:54 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 09:53 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 09:51 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45849 and previous config saved to /var/cache/conftool/dbconfig/20230314-094828-marostegui.json
* 09:42 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:36 moritzm: installing NSS security updates
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45848 and previous config saved to /var/cache/conftool/dbconfig/20230314-093321-marostegui.json
* 09:32 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:23 Emperor: reboot ms-be2040 [[phab:T331860|T331860]]
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45847 and previous config saved to /var/cache/conftool/dbconfig/20230314-090649-marostegui.json
* 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 09:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45846 and previous config saved to /var/cache/conftool/dbconfig/20230314-084249-marostegui.json
* 08:38 vgutierrez: test HAProxy 2.6.10 in cp4044 and cp4045
* 08:31 vgutierrez: fetch haproxy 2.6.10 for thirdparty/haproxy26 (buster && bullseye) @ apt.wm.o
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45845 and previous config saved to /var/cache/conftool/dbconfig/20230314-082743-marostegui.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45843 and previous config saved to /var/cache/conftool/dbconfig/20230314-081236-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45842 and previous config saved to /var/cache/conftool/dbconfig/20230314-075730-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45841 and previous config saved to /var/cache/conftool/dbconfig/20230314-073210-marostegui.json
* 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45840 and previous config saved to /var/cache/conftool/dbconfig/20230314-073149-marostegui.json
* 07:26 marostegui: Migrate db1183 to mariadb m5 eqiad dbmaint 10.6 [[phab:T322294|T322294]]
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45839 and previous config saved to /var/cache/conftool/dbconfig/20230314-071643-marostegui.json
* 07:13 marostegui: Migrate db2135 to mariadb m5 codfw dbmaint 10.6
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45838 and previous config saved to /var/cache/conftool/dbconfig/20230314-070137-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45837 and previous config saved to /var/cache/conftool/dbconfig/20230314-064630-marostegui.json
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts centrallog1001
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 06:41 hashar: gerrit: changed `operations/puppet` merge strategy to allow "content merges" (see `ops` list for the rationale)
* 06:36 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 06:34 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 06:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts centrallog1001
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45836 and previous config saved to /var/cache/conftool/dbconfig/20230314-061633-marostegui.json
* 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 06:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:07 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 05:05 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@61ef435]: 0.3.122 (duration: 08m 45s)
* 04:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.122` on canary `wdqs1003`; proceeding to rest of fleet
* 04:56 ryankemper@deploy2002: Started deploy [wdqs/wdqs@61ef435]: 0.3.122
* 04:56 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.122`. Pre-deploy tests passing on canary `wdqs1003`
* 03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.25 (duration: 02m 20s)
* 03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]] (duration: 51m 02s)
* 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 02:22 legoktm: removed user's 2FA on wikitech for [[phab:T331955|T331955]]
* 02:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45835 and previous config saved to /var/cache/conftool/dbconfig/20230314-022023-marostegui.json
* 02:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45834 and previous config saved to /var/cache/conftool/dbconfig/20230314-020517-marostegui.json
* 01:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45833 and previous config saved to /var/cache/conftool/dbconfig/20230314-015011-marostegui.json
* 01:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45832 and previous config saved to /var/cache/conftool/dbconfig/20230314-013504-marostegui.json
* 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45831 and previous config saved to /var/cache/conftool/dbconfig/20230314-012442-marostegui.json
* 01:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 01:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45830 and previous config saved to /var/cache/conftool/dbconfig/20230314-012421-marostegui.json
* 01:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45829 and previous config saved to /var/cache/conftool/dbconfig/20230314-010915-marostegui.json
* 00:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45828 and previous config saved to /var/cache/conftool/dbconfig/20230314-005409-marostegui.json
* 00:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45827 and previous config saved to /var/cache/conftool/dbconfig/20230314-003903-marostegui.json
* 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45826 and previous config saved to /var/cache/conftool/dbconfig/20230314-002840-marostegui.json
* 00:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 00:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45825 and previous config saved to /var/cache/conftool/dbconfig/20230314-002819-marostegui.json
* 00:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45824 and previous config saved to /var/cache/conftool/dbconfig/20230314-001313-marostegui.json
 
== 2023-03-13 ==
* 23:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45823 and previous config saved to /var/cache/conftool/dbconfig/20230313-235807-marostegui.json
* 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45822 and previous config saved to /var/cache/conftool/dbconfig/20230313-234301-marostegui.json
* 23:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 23:33 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 23:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45821 and previous config saved to /var/cache/conftool/dbconfig/20230313-233127-marostegui.json
* 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 23:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45820 and previous config saved to /var/cache/conftool/dbconfig/20230313-233050-marostegui.json
* 23:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45819 and previous config saved to /var/cache/conftool/dbconfig/20230313-231544-marostegui.json
* 23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45818 and previous config saved to /var/cache/conftool/dbconfig/20230313-230038-marostegui.json
* 22:48 zabe@deploy2002: Finished scap: [[gerrit:898037{{!}}noc: Switch default selection on db.php from eqiad to codfw]] (duration: 06m 56s)
* 22:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45817 and previous config saved to /var/cache/conftool/dbconfig/20230313-224532-marostegui.json
* 22:41 zabe@deploy2002: Started scap: [[gerrit:898037{{!}}noc: Switch default selection on db.php from eqiad to codfw]]
* 22:40 zabe@deploy2002: scap failed: BrokenPipeError [Errno 32] Broken pipe (duration: 00m 00s)
* {{safesubst:SAL entry|1=22:40 zabe@deploy2002: Started scap: [[gerrit:898037}}
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45816 and previous config saved to /var/cache/conftool/dbconfig/20230313-223331-marostegui.json
* 22:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 22:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45815 and previous config saved to /var/cache/conftool/dbconfig/20230313-223309-marostegui.json
* 22:30 sbassett@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Set ext:StopForumSpam to enforce on es.wikiversity (duration: 06m 59s)
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45814 and previous config saved to /var/cache/conftool/dbconfig/20230313-221803-marostegui.json
* 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45813 and previous config saved to /var/cache/conftool/dbconfig/20230313-220257-marostegui.json
* 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45812 and previous config saved to /var/cache/conftool/dbconfig/20230313-214751-marostegui.json
* 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45811 and previous config saved to /var/cache/conftool/dbconfig/20230313-213544-marostegui.json
* 21:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 21:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45810 and previous config saved to /var/cache/conftool/dbconfig/20230313-213523-marostegui.json
* 21:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS bullseye
* 21:21 wfan: remove -d for jobs-dlocal queue runner
* 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45809 and previous config saved to /var/cache/conftool/dbconfig/20230313-212017-marostegui.json
* 21:06 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45808 and previous config saved to /var/cache/conftool/dbconfig/20230313-210510-marostegui.json
* 21:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
* 21:01 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
* 21:01 ejegg: enabled jobs-dlocal queue runner
* 21:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45807 and previous config saved to /var/cache/conftool/dbconfig/20230313-205004-marostegui.json
* 20:47 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS bullseye
* 20:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein (duration: 00m 14s)
* 20:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein
* 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45806 and previous config saved to /var/cache/conftool/dbconfig/20230313-203824-marostegui.json
* 20:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 20:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45805 and previous config saved to /var/cache/conftool/dbconfig/20230313-203802-marostegui.json
* 20:27 kindrobot: close UTC late backport window
* 20:26 kindrobot@deploy2002: Finished scap: Backport for [[gerrit:894765{{!}}Add header at top of main page (T325362)]] (duration: 12m 11s)
* 20:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45804 and previous config saved to /var/cache/conftool/dbconfig/20230313-202256-marostegui.json
* 20:16 kindrobot@deploy2002: kindrobot and ksarabia: Backport for [[gerrit:894765{{!}}Add header at top of main page (T325362)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:15 kindrobot: start UTC late backport window
* 20:14 kindrobot@deploy2002: Started scap: Backport for [[gerrit:894765{{!}}Add header at top of main page (T325362)]]
* 20:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45803 and previous config saved to /var/cache/conftool/dbconfig/20230313-200750-marostegui.json
* 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45802 and previous config saved to /var/cache/conftool/dbconfig/20230313-195244-marostegui.json
* 19:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 19:51 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
* 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45801 and previous config saved to /var/cache/conftool/dbconfig/20230313-194148-marostegui.json
* 19:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 19:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45800 and previous config saved to /var/cache/conftool/dbconfig/20230313-194116-marostegui.json
* 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 19:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
* 19:38 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
* 19:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
* 19:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45799 and previous config saved to /var/cache/conftool/dbconfig/20230313-192610-marostegui.json
* 19:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45798 and previous config saved to /var/cache/conftool/dbconfig/20230313-191104-marostegui.json
* 19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45797 and previous config saved to /var/cache/conftool/dbconfig/20230313-185558-marostegui.json
* 18:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:48 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:47 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45796 and previous config saved to /var/cache/conftool/dbconfig/20230313-184502-marostegui.json
* 18:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 18:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable (duration: 00m 13s)
* 18:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable
* 18:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:36 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date (duration: 00m 14s)
* 18:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 18:36 ebernhardson@deploy2002: Started deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date
* 18:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45795 and previous config saved to /var/cache/conftool/dbconfig/20230313-183628-marostegui.json
* 18:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45794 and previous config saved to /var/cache/conftool/dbconfig/20230313-182121-marostegui.json
* 18:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 18:11 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45793 and previous config saved to /var/cache/conftool/dbconfig/20230313-180615-marostegui.json
* 17:56 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 17:55 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45792 and previous config saved to /var/cache/conftool/dbconfig/20230313-175109-marostegui.json
* 17:50 dancy@deploy2002: Finished scap: test cleanup (duration: 06m 40s)
* 17:44 dancy@deploy2002: Started scap: test cleanup
* 17:43 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45791 and previous config saved to /var/cache/conftool/dbconfig/20230313-174030-marostegui.json
* 17:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 17:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45790 and previous config saved to /var/cache/conftool/dbconfig/20230313-174009-marostegui.json
* 17:35 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:33 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45789 and previous config saved to /var/cache/conftool/dbconfig/20230313-172503-marostegui.json
* 17:22 dancy@deploy2002: Finished scap: testing [[phab:T329857|T329857]] (duration: 06m 54s)
* 17:16 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 17:15 dancy@deploy2002: Started scap: testing [[phab:T329857|T329857]]
* 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:13 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:12 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:12 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:11 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:11 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 17:11 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:10 Emperor: roll-restart of codfw eqiad frontends
* 17:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:10 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:10 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45788 and previous config saved to /var/cache/conftool/dbconfig/20230313-170955-marostegui.json
* 17:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:08 dancy@deploy2002: Installation of scap version "4.46.0" completed for 553 hosts
* 17:07 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
* 17:04 bd808: Ran cache.purge_openstack_users() for Striker following deploy of {{Gerrit|e1f7491}} ([[phab:T331674|T331674]])
* 17:04 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45787 and previous config saved to /var/cache/conftool/dbconfig/20230313-165449-marostegui.json
* 16:47 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45785 and previous config saved to /var/cache/conftool/dbconfig/20230313-164410-marostegui.json
* 16:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 16:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45784 and previous config saved to /var/cache/conftool/dbconfig/20230313-164349-marostegui.json
* 16:36 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45783 and previous config saved to /var/cache/conftool/dbconfig/20230313-162843-marostegui.json
* 16:20 moritzm: imported tideways 5.0.4-2+wmf1+buster1+icu67u1  [[phab:T329491|T329491]]
* 16:18 dancy@deploy2002: Finished scap: testing (duration: 06m 53s)
* 16:17 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 16:17 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 16:17 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 16:16 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 16:16 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 16:16 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45782 and previous config saved to /var/cache/conftool/dbconfig/20230313-161337-marostegui.json
* 16:11 dancy@deploy2002: Started scap: testing
* 16:06 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: [[gerrit:897915{{!}} Bumping portals to master (T128546)]] (duration: 06m 15s)
* 16:00 moritzm: imported xdebug 3.0.3+2.9.8+2.8.1+2.5.5-0+deb11u1+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 16:00 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:897915{{!}} Bumping portals to master (T128546)]] (duration: 06m 43s)
* 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45781 and previous config saved to /var/cache/conftool/dbconfig/20230313-155830-marostegui.json
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45780 and previous config saved to /var/cache/conftool/dbconfig/20230313-154641-marostegui.json
* 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 15:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 15:35 moritzm: imported php-yaml 2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1+icu67u1 [[phab:T329491|T329491]]
* 15:31 dancy@deploy2002: Finished scap: testing [[phab:T329857|T329857]] (duration: 10m 08s)
* 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 15:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 15:21 dancy@deploy2002: Started scap: testing [[phab:T329857|T329857]]
* 15:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
* 15:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
* 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45779 and previous config saved to /var/cache/conftool/dbconfig/20230313-150523-marostegui.json
* 15:03 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
* 14:53 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
* 14:51 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 14:51 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 14:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45778 and previous config saved to /var/cache/conftool/dbconfig/20230313-145016-marostegui.json
* 14:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 14:38 jbond: disable puppet fleet wide to debug strange issue
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45777 and previous config saved to /var/cache/conftool/dbconfig/20230313-143510-marostegui.json
* 14:23 claime: switch noc.wikimedia.org from eqiad to codfw - [[phab:T331634|T331634]]
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45776 and previous config saved to /var/cache/conftool/dbconfig/20230313-142004-marostegui.json
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2182 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45774 and previous config saved to /var/cache/conftool/dbconfig/20230313-141409-marostegui.json
* 14:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 14:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45773 and previous config saved to /var/cache/conftool/dbconfig/20230313-141348-marostegui.json
* 14:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45772 and previous config saved to /var/cache/conftool/dbconfig/20230313-135842-marostegui.json
* 13:50 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
* 13:49 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
* 13:48 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
* 13:48 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6] (duration: 00m 11s)
* 13:48 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6]
* 13:47 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
* 13:46 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
* 13:45 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
* 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45770 and previous config saved to /var/cache/conftool/dbconfig/20230313-134336-marostegui.json
* 13:40 moritzm: imported wikidiff2 1.13.0-1+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45769 and previous config saved to /var/cache/conftool/dbconfig/20230313-132829-marostegui.json
* 13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1 [[phab:T329491|T329491]]
* 13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1T329491
* 13:23 taavi@deploy2002: Finished scap: Backport for [[gerrit:897195{{!}}[trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399)]], [[gerrit:897837{{!}}[trwiki] Removing the temporary logo, previously added, and already reverted (T329047)]] (duration: 08m 10s)
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45768 and previous config saved to /var/cache/conftool/dbconfig/20230313-132123-marostegui.json
* 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45767 and previous config saved to /var/cache/conftool/dbconfig/20230313-132101-marostegui.json
* 13:16 taavi@deploy2002: taavi and superpes: Backport for [[gerrit:897195{{!}}[trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399)]], [[gerrit:897837{{!}}[trwiki] Removing the temporary logo, previously added, and already reverted (T329047)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:15 taavi@deploy2002: Started scap: Backport for [[gerrit:897195{{!}}[trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399)]], [[gerrit:897837{{!}}[trwiki] Removing the temporary logo, previously added, and already reverted (T329047)]]
* 13:13 taavi@deploy2002: Finished scap: Backport for [[gerrit:896321{{!}}zhwiki: Add movefile to extendedconfirmed (T331691)]] (duration: 09m 29s)
* 13:11 moritzm: imported php-luasandbox 4.0.2-3+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45766 and previous config saved to /var/cache/conftool/dbconfig/20230313-130555-marostegui.json
* 13:05 taavi@deploy2002: stang and taavi: Backport for [[gerrit:896321{{!}}zhwiki: Add movefile to extendedconfirmed (T331691)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:03 taavi@deploy2002: Started scap: Backport for [[gerrit:896321{{!}}zhwiki: Add movefile to extendedconfirmed (T331691)]]
* 13:00 moritzm: imported php-wmerrors 2.0.0~git20190628.183ef7d-3+wmf1+buster1+icu67u1  [[phab:T329491|T329491]]
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45764 and previous config saved to /var/cache/conftool/dbconfig/20230313-125049-marostegui.json
* 12:48 hnowlan: restarting codfw thumbor instances to attempt to remedy 502 issues
* 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.codfw.wmnet
* 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.codfw.wmnet
* 12:37 moritzm: imported php-geoip 1.1.1-7+wmf2+buster1+icu67u1  [[phab:T329491|T329491]]
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45763 and previous config saved to /var/cache/conftool/dbconfig/20230313-123543-marostegui.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45762 and previous config saved to /var/cache/conftool/dbconfig/20230313-122928-marostegui.json
* 12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 12:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45761 and previous config saved to /var/cache/conftool/dbconfig/20230313-122906-marostegui.json
* 12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:19 moritzm: imported php-redis 5.3.2+4.3.0-2+deb11u1+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45760 and previous config saved to /var/cache/conftool/dbconfig/20230313-121400-marostegui.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45759 and previous config saved to /var/cache/conftool/dbconfig/20230313-115854-marostegui.json
* 11:58 moritzm: imported php-memcached 3.1.5+2.2.0-5+deb11u1+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 11:46 moritzm: imported php-igbinary 3.2.1+2.0.8-2+wmf1+buster1+icu67u1  [[phab:T329491|T329491]]
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45758 and previous config saved to /var/cache/conftool/dbconfig/20230313-114348-marostegui.json
* 11:31 moritzm: imported php-apcu 5.1.19+4.0.11-3+wmf2+buster1+icu67u1 [[phab:T329491|T329491]]
* 11:22 jnuche@deploy2002: Installation of scap version "latest" completed for 553 hosts
* 11:21 jnuche@deploy2002: Installing scap version "latest" for 553 hosts
* 11:11 moritzm: imported php-msgpack 2.1.2+0.5.7-2+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 10:55 moritzm: imported php-imagick 3.4.4+php8.0+3.4.4-2+deb11u2+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2159 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45757 and previous config saved to /var/cache/conftool/dbconfig/20230313-104322-marostegui.json
* 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45756 and previous config saved to /var/cache/conftool/dbconfig/20230313-104246-marostegui.json
* 10:38 moritzm: imported php-pcov 1.0.6-4+wmf1~buster1+icu67u1 [[phab:T329491|T329491]]
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45755 and previous config saved to /var/cache/conftool/dbconfig/20230313-102740-marostegui.json
* 10:26 moritzm: imported php-defaults 7.4+76+wmf1~buster2+icu67u1 [[phab:T329491|T329491]]
* 10:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55701
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45754 and previous config saved to /var/cache/conftool/dbconfig/20230313-101234-marostegui.json
* 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55701
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38193
* 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38193
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46632
* 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46632
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6663
* 10:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6663
* 10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45558
* 10:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45558
* 10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38082
* 10:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38082
* 10:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 668
* 10:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 668
* 10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:02 moritzm: imported dh-php 0.35+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45753 and previous config saved to /var/cache/conftool/dbconfig/20230313-095728-marostegui.json
* 09:55 vgutierrez: Enable haproxy hardening in cp hosts globally - [[phab:T323944|T323944]]
* 09:52 zabe@deploy2002: Finished scap: Backport for [[gerrit:896037{{!}}Drop loading of former extension Renameuser's i18n strings [Re-apply]]] (duration: 07m 40s)
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2150 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45752 and previous config saved to /var/cache/conftool/dbconfig/20230313-095119-marostegui.json
* 09:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 09:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45751 and previous config saved to /var/cache/conftool/dbconfig/20230313-095058-marostegui.json
* 09:48 jayme: pcc-worker1003:~# rm -r /srv/jenkins/puppet-compiler/40076 - / back to 70%
* 09:46 zabe@deploy2002: jforrester and zabe: Backport for [[gerrit:896037{{!}}Drop loading of former extension Renameuser's i18n strings [Re-apply]]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 09:45 jayme: pcc-worker1002:~# rm -r /srv/jenkins/puppet-compiler/40078 - / back to 47% usage
* 09:44 zabe@deploy2002: Started scap: Backport for [[gerrit:896037{{!}}Drop loading of former extension Renameuser's i18n strings [Re-apply]]]
* 09:44 zabe@deploy2002: Finished scap: Backport for [[gerrit:897804{{!}}Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685)]] (duration: 07m 52s)
* 09:40 jayme: pcc-worker1001:~# rm -r /srv/jenkins/puppet-compiler/40079 /srv/jenkins/puppet-compiler/38943 - / back to 68% usage
* 09:38 zabe@deploy2002: zabe: Backport for [[gerrit:897804{{!}}Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:36 zabe@deploy2002: Started scap: Backport for [[gerrit:897804{{!}}Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685)]]
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45750 and previous config saved to /var/cache/conftool/dbconfig/20230313-093552-marostegui.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45749 and previous config saved to /var/cache/conftool/dbconfig/20230313-092045-marostegui.json
* 09:16 moritzm: installing python-werkzeug security updates
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45748 and previous config saved to /var/cache/conftool/dbconfig/20230313-090539-marostegui.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2122 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45747 and previous config saved to /var/cache/conftool/dbconfig/20230313-085937-marostegui.json
* 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45746 and previous config saved to /var/cache/conftool/dbconfig/20230313-085916-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45745 and previous config saved to /var/cache/conftool/dbconfig/20230313-084409-marostegui.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45744 and previous config saved to /var/cache/conftool/dbconfig/20230313-082903-marostegui.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45743 and previous config saved to /var/cache/conftool/dbconfig/20230313-081357-marostegui.json
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2121 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45742 and previous config saved to /var/cache/conftool/dbconfig/20230313-080759-marostegui.json
* 08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45741 and previous config saved to /var/cache/conftool/dbconfig/20230313-080738-marostegui.json
* 08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 08:02 moritzm: installing curl security updates
* 07:58 zabe@deploy2002: Finished scap: Backport for [[gerrit:897188{{!}}use core Renameuser classes (T27482)]], [[gerrit:897187{{!}}UserRenameHandler: Use core RenameUser classes (T27482)]] (duration: 07m 02s)
* 07:53 zabe@deploy2002: zabe: Backport for [[gerrit:897188{{!}}use core Renameuser classes (T27482)]], [[gerrit:897187{{!}}UserRenameHandler: Use core RenameUser classes (T27482)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45740 and previous config saved to /var/cache/conftool/dbconfig/20230313-075232-marostegui.json
* 07:51 zabe@deploy2002: Started scap: Backport for [[gerrit:897188{{!}}use core Renameuser classes (T27482)]], [[gerrit:897187{{!}}UserRenameHandler: Use core RenameUser classes (T27482)]]
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45739 and previous config saved to /var/cache/conftool/dbconfig/20230313-073725-marostegui.json
* 07:37 marostegui: Remove pagetriage_log from enwiki [[phab:T328309|T328309]]
* 07:32 kartik@deploy2002: Finished scap: Backport for [[gerrit:897404{{!}}testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541)]] (duration: 17m 04s)
* 07:25 kartik@deploy2002: kartik: Backport for [[gerrit:897404{{!}}testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45738 and previous config saved to /var/cache/conftool/dbconfig/20230313-072219-marostegui.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2120 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45737 and previous config saved to /var/cache/conftool/dbconfig/20230313-071522-marostegui.json
* 07:15 kartik@deploy2002: Started scap: Backport for [[gerrit:897404{{!}}testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541)]]
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45736 and previous config saved to /var/cache/conftool/dbconfig/20230313-071501-marostegui.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45735 and previous config saved to /var/cache/conftool/dbconfig/20230313-065954-marostegui.json
* 06:52 marostegui_: Remove pagetriage_log from testwiki and test2wiki [[phab:T328309|T328309]]
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45734 and previous config saved to /var/cache/conftool/dbconfig/20230313-064448-marostegui.json
* 06:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9873
* 06:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9873
* 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9507
* 06:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9507
* 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15830
* 06:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15830
* 06:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9902
* 06:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9902
* 06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45733 and previous config saved to /var/cache/conftool/dbconfig/20230313-062942-marostegui.json
* 06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
* 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34549
* 06:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34549
* 06:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 29357
* 06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 29357
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2108 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45732 and previous config saved to /var/cache/conftool/dbconfig/20230313-062244-marostegui.json
* 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 06:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138886
* 06:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138886
* 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
* 06:16 marostegui_: Deploy schema change on s3 codfw dbmaint [[phab:T329684|T329684]]
* 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 04:37 kart_: Updated cxserver to 2023-03-09-061555-production ([[phab:T331097|T331097]], [[phab:T327102|T327102]], [[phab:T326541|T326541]])
* 04:19 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 04:19 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 04:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 04:17 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 04:12 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 04:12 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
 
== 2023-03-12 ==
* 10:47 elukey: reset offsets on kafka jumbo for benthos webrequest live (as indicated in https://phabricator.wikimedia.org/T331801#8685569)
* 07:50 elukey: restart benthos-webrequest-live on centrallog1002 - [[phab:T331801|T331801]]
* 07:49 elukey: restart benthos-webrequest-live on centrallog2002 - [[phab:T331801|T331801]]
* 07:49 elukey: stop and mask benthos-webrequest-live on centrallog1001 - [[phab:T331801|T331801]]
 
== 2023-03-10 ==
* 22:43 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 22:32 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 22:26 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 22:16 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 21:24 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 21:14 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 21:13 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 21:03 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 20:43 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78] (duration: 00m 10s)
* 20:43 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78]
* 20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 19:39 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 00m 09s)
* 19:38 milimetric@deploy2002: Started deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942]
* 19:38 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 08m 08s)
* 19:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:30 milimetric@deploy2002: Started deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942]
* 19:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
* 19:17 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
* 19:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 19:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 19:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
* 19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 18:55 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944] (duration: 00m 12s)
* 18:55 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944]
* 18:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
* 18:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 18:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 18:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 18:31 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 18:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 18:12 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 18:04 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:59 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:53 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:52 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 17:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:40 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:34 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:28 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:22 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 16:49 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 16:42 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 16:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 16:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 16:04 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2003-dev']
* 16:04 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:59 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:59 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:57 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:57 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2003-dev']
* 15:53 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:53 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:50 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:50 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
* 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
* 15:08 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
* 14:52 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
* 14:50 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 14:47 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 14:38 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
* 14:36 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
* 14:22 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
* 14:20 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
* 14:09 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:08 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
* 13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
* 13:54 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
* 13:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:34 Emperor: restart swift-object-replicator on ms-be2067
* 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
* 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
* 12:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync data for new cloudsw1-b1-codfw device. - cmooney@cumin1001 - [[phab:T327919|T327919]]"
* 12:49 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync data for new cloudsw1-b1-codfw device. - cmooney@cumin1001 - [[phab:T327919|T327919]]"
* 12:46 moritzm: installing libsdl2 security updates
* 12:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
* 12:31 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
* 12:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 12:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 12:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
* 12:18 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
* 12:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 12:15 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 12:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 12:15 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 11:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host urldownloader1004.wikimedia.org with OS bullseye
* 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on urldownloader1004.wikimedia.org with reason: host reimage
* 11:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on urldownloader1004.wikimedia.org with reason: host reimage
* 11:35 moritzm: instaling isc-dhcp bugfix updates from DLA 3326
* 11:20 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:20 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1004.wikimedia.org with OS bullseye
* 11:04 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=jawiki --logwiki=metawiki --ignorestatus 'あ ーあーあーあーあー' 'ARIAUSO' # [[phab:T331685|T331685]]
* 11:03 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'ZSTK Lublin' 'Sonabet4' # [[phab:T331685|T331685]]
* 11:01 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'Yair.herman' 'Manor258' # [[phab:T331685|T331685]]
* 10:58 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=afwiki --logwiki=metawiki --ignorestatus 'Tranquill Komnin' 'Nevechear' # [[phab:T331685|T331685]]
* 10:58 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'Tosikuni Japan' 'Revisionist14' # [[phab:T331685|T331685]]
* 10:54 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'Studio 7 Piaseczno Jarosław Zawadzki' 'Jarosław Andrzej Zawadzki (muzyk)' # [[phab:T331685|T331685]]
* 10:52 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=afwiki --logwiki=metawiki --ignorestatus 'Siniy7' 'Viktorbublik' # [[phab:T331685|T331685]]
* 10:51 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=arwiki --logwiki=metawiki --ignorestatus 'Reza amjad(iran)' 'رضا امجد (تبریز)' # [[phab:T331685|T331685]]
* 10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'Mac700' 'Unknown001100' # [[phab:T331685|T331685]]
* 10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'HonzaSTECH' 'ShadyMedic' # [[phab:T331685|T331685]]
* 10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'ExplosiveCreeper294' 'NotGalxyGaming' # [[phab:T331685|T331685]]
* 10:41 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Mac700' 'Unknown001100' # [[phab:T331685|T331685]]
* 10:41 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'HonzaSTECH' 'ShadyMedic' # [[phab:T331685|T331685]]
* 10:40 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'ExplosiveCreeper294' 'NotGalxyGaming' # [[phab:T331685|T331685]]
* 09:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:58 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove netbox-generated DNS records which have been defined manually. - cmooney@cumin1001"
* 09:57 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove netbox-generated DNS records which have been defined manually. - cmooney@cumin1001"
* 09:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 02:09 zabe@deploy2002: Finished scap: [[phab:T331685|T331685]] (duration: 07m 52s)
* 02:02 zabe@deploy2002: Started scap: [[phab:T331685|T331685]]
* 02:01 zabe@deploy2002: Finished scap: [[phab:T331685|T331685]] (duration: 07m 28s)
* 02:00 ejegg: SmashPig upgraded from {{Gerrit|c6775c60}} to {{Gerrit|3b84e4cb}}
* 01:55 ejegg: payments-wiki upgraded from {{Gerrit|05a5e09a}} to {{Gerrit|61c30a4f}}
* 01:54 zabe@deploy2002: Started scap: [[phab:T331685|T331685]]
* 01:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
 
== 2023-03-09 ==
* 23:52 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7b25fbf]: import_ttl: correct date formatting (duration: 00m 14s)
* 23:52 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7b25fbf]: import_ttl: correct date formatting
* 23:33 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b122672]: import_ttl: replace HdfsSensor with URLSensor (duration: 00m 14s)
* 23:32 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b122672]: import_ttl: replace HdfsSensor with URLSensor
* 23:09 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 23:09 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 23:04 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 23:04 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
* 23:01 sukhe: pool new dns hosts dns1003 and dns2003: [[phab:T330670|T330670]]
* 22:53 sukhe: run homer in cr*-<nowiki>{</nowiki>codfw,eqiad<nowiki>}</nowiki> for CR 896190
* 22:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 22:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2003.wikimedia.org with OS bullseye
* 22:43 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 22:41 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 22:40 bd808: Forced puppet run on cloudweb100[34] to apply quick fix for [[phab:T331674|T331674]]
* 22:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for new links to cloudsw1-b1-codfw - cmooney@cumin1001"
* 22:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for new links to cloudsw1-b1-codfw - cmooney@cumin1001"
* 22:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1003.wikimedia.org with OS bullseye
* 22:20 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 22:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2003.wikimedia.org with reason: host reimage
* 22:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 22:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2003.wikimedia.org with reason: host reimage
* 22:14 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
* 22:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2003.wikimedia.org with OS bullseye
* 22:02 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns2003.wikimedia.org with OS bullseye
* 21:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2003.wikimedia.org with OS bullseye
* 21:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1003.wikimedia.org with reason: host reimage
* 21:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 21:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1003.wikimedia.org with reason: host reimage
* 21:38 TheresNoTime: close UTC late backport
* 21:37 samtar@deploy2002: Finished scap: Backport for [[gerrit:893542{{!}}Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829)]] (duration: 10m 43s)
* 21:35 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
* 21:35 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
* 21:28 samtar@deploy2002: samtar and nray: Backport for [[gerrit:893542{{!}}Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 21:27 samtar@deploy2002: Started scap: Backport for [[gerrit:893542{{!}}Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829)]]
* 21:24 samtar@deploy2002: Finished scap: Backport for [[gerrit:895352{{!}}Unload RenameUser, now part of core: Part II of II]] (duration: 07m 38s)
* 21:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust and remove reverse DNS records after cloudsw1-b1-codfw migration. - cmooney@cumin1001"
* 21:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to enable incr shard recovery throughput - ryankemper@cumin1001 - [[phab:T317816|T317816]]
* 21:18 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust and remove reverse DNS records after cloudsw1-b1-codfw migration. - cmooney@cumin1001"
* 21:18 samtar@deploy2002: samtar and jforrester: Backport for [[gerrit:895352{{!}}Unload RenameUser, now part of core: Part II of II]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 21:17 samtar@deploy2002: Started scap: Backport for [[gerrit:895352{{!}}Unload RenameUser, now part of core: Part II of II]]
* 21:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 21:14 samtar@deploy2002: Finished scap: Backport for [[gerrit:895351{{!}}Unload RenameUser, now part of core: Part I of II]] (duration: 12m 19s)
* 21:10 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns2003
* 21:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
* 21:09 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns2003
* 21:09 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1003
* 21:08 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns1003
* 21:07 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
* 21:03 samtar@deploy2002: samtar and jforrester: Backport for [[gerrit:895351{{!}}Unload RenameUser, now part of core: Part I of II]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 21:02 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns2003.mgmt.codfw.wmnet on all recursors
* 21:02 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns2003.mgmt.codfw.wmnet on all recursors
* 21:02 samtar@deploy2002: Started scap: Backport for [[gerrit:895351{{!}}Unload RenameUser, now part of core: Part I of II]]
* 20:59 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns2003.wikimedia.org on all recursors
* 20:59 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns2003.wikimedia.org on all recursors
* 20:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
* 20:47 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:47 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns2003 (renamed from authdns2001) - sukhe@cumin2002"
* 20:46 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns2003 (renamed from authdns2001) - sukhe@cumin2002"
* 20:44 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 20:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1003.wikimedia.org']
* 20:30 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1003.wikimedia.org']
* 20:25 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
* 20:24 topranks: move cloud-hosts1-b-codfw GW from core routers to cloudsw1-b1-codfw [[phab:T327919|T327919]]
* 20:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
* 20:12 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns1003.wikimedia.org on all recursors
* 20:12 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns1003.wikimedia.org on all recursors
* 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
* 20:07 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
* 20:06 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 19:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to enable incr shard recovery throughput - ryankemper@cumin1001 - [[phab:T317816|T317816]]
* 19:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on an-worker1078.eqiad.wmnet with reason: Replacing RAID BBU
* 19:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on an-worker1078.eqiad.wmnet with reason: Replacing RAID BBU
* 19:15 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1003
* 19:15 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns1003
* 19:14 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:14 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
* 19:12 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
* 19:10 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.26  refs [[phab:T330204|T330204]]
* 19:06 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 18:53 sukhe: enable puppet on A:dns-rec and force puppet run: [[phab:T330670|T330670]]
* 18:50 mforns@deploy2002: Finished deploy [airflow-dags/analytics@3419b7d]: (no justification provided) (duration: 00m 10s)
* 18:50 mforns@deploy2002: Started deploy [airflow-dags/analytics@3419b7d]: (no justification provided)
* 18:47 sukhe: enable puppet on dns4003 to merge 895894
* 18:44 sukhe: disable puppet on A:dns-rec to merge CR 895894
* 18:38 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 18:38 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 18:34 sukhe: [correction] homer "cr*-codfw*" commit "Remove authdns2001 from homer, [[phab:T330670|T330670]]"
* 18:34 sukhe: homer "cr*-codfw*" commit "Remove authdns1001 from homer, [[phab:T330670|T330670]]"
* 18:31 sukhe: homer "cr*-eqiad*" commit "Remove authdns1001 from homer, [[phab:T330670|T330670]]"
* 18:26 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts authdns[1001,2001].wikimedia.org
* 18:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:25 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: authdns[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:24 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: authdns[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 18:22 sukhe: running puppet-agent on A:dns-auth to remove deprecated authdns[12]001
* 18:22 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 18:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 18:15 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts authdns[1001,2001].wikimedia.org
* 18:11 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 18:10 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 18:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 18:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:09 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 18:09 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 18:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 18:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 18:08 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 18:00 sukhe: cr*-codfw [ns0]: set routing-options static route 208.80.154.238/32 next-hop 208.80.153.77: [[phab:T330670|T330670]]
* 17:53 sukhe: cr*-codfw [ns1]: set routing-options static route 208.80.153.231/32 next-hop 208.80.153.77: [[phab:T