You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(bstorm_: cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 T251598)
imported>Stashbot
(brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet)
 
(893 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-05-08 ==
== 2023-01-28 ==
* 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 [[phab:T251598|T251598]]
* 00:36 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
* 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view on labsdb1012 [[phab:T251598|T251598]]
* 00:35 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
* 21:33 bstorm_: cleaning up wb_terms_no_longer_updated view on labsdb1009 [[phab:T251598|T251598]]
* 00:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS bullseye
* 21:06 ottomata: running prefered replica election for kafka-jumbo  to get preferred leaders back after reboot of broker earlier today - [[phab:T252203|T252203]]
* 19:16 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:12 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:07 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 18:12 andrewbogott: reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for [[phab:T252121|T252121]]
* 17:59 marostegui: Extend /srv by 500G on labsdb1011 [[phab:T249188|T249188]]
* 16:55 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:53 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:36 ottomata: starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - [[phab:T252203|T252203]]
* 15:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:27 ottomata: stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - [[phab:T252203|T252203]]
* 14:50 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s)
* 14:50 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only
* 14:05 akosiaris: [[phab:T243106|T243106]] undo experiment with DROP iptable rules this time around. Use mw1331, mw1348
* 13:22 vgutierrez: rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - [[phab:T249335|T249335]]
* 13:20 akosiaris: [[phab:T243106|T243106]] redo experiment with DROP iptable rules this time around. Use mw1331, mw1348
* 13:16 akosiaris: [[phab:T243106|T243106]] undo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348. Experiment done successfully, no issues to the infrastructure.
* 12:49 akosiaris: [[phab:T243106|T243106]] redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348
* 12:49 akosiaris: [[phab:T243106|T243106]] redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle
* 11:49 hnowlan: restarting cassandra on restbase2009 for java updates
* 11:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:08 akosiaris: repool eqiad eventgate-analytics. Test concluded
* 11:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:54 mutante: disabling puppet on puppetmasters temporarily to switch them carefully to use httpd module and not apache module which we want to get rid of
* 09:52 akosiaris: depool eqiad eventgate-analytics for a test involving reinitializing the eqiad kubernetes cluster
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:51 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:45 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=eventgate-analytics.*
* 08:20 vgutierrez: rolling restart of ats-tls on esams - [[phab:T249335|T249335]]
* 07:19 vgutierrez: ats-tls restart on cp3050 and cp3052 (max_connections_active_in experiment) - [[phab:T249335|T249335]]
* 07:07 mutante: phabricator rmdir /var/run/phd/pid  - empty and now unused
* 07:01 moritzm: installing php5 security updates
* 05:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:10 marostegui: Upgrade pc1010
* 00:30 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert all wikis except test to 1.35.0-wmf.30 for [[phab:T252179|T252179]]
* 00:19 brennen: rolling 1.35.0-wmf.31 train back to group0 for [[phab:T252179|T252179]]


== 2020-05-07 ==
== 2023-01-27 ==
* 22:36 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
* 23:55 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
* 22:31 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Scribunto/includes/engines/LuaCommon/TitleLibrary.php: [[gerrit:595054{{!}}Handle RevisionAccessException with try-catch (T252156)]] (duration: 01m 08s)
* 23:52 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
* 20:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 23:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
* 20:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:31 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS bullseye
* 20:10 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingStreamNames: set initial stream names, as yet unused - [[phab:T238230|T238230]] (duration: 01m 07s)
* 23:22 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
* 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.35.0-wmf.30
* 23:21 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
* 19:09 brennen: rolling 1.35.0-wmf.31 back to group1
* 22:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
* 19:09 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki1001 - [[phab:T252010|T252010]]
* 22:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
* 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
* 22:20 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
* 18:25 ppchelko@deploy1001: Finished deploy [changeprop/deploy@383fba5]: Enable both purging types [[phab:T252142|T252142]] (duration: 01m 17s)
* 22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.2-1+deb11u1_amd64.changes  # [[phab:T328162|T328162]]
* 18:23 ppchelko@deploy1001: Started deploy [changeprop/deploy@383fba5]: Enable both purging types [[phab:T252142|T252142]]
* 22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.2-1_amd64.changes  # [[phab:T328162|T328162]]
* 18:15 Urbanecm: Morning SWAT done
* 22:00 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|899c175}}: Update project icons to refreshed SVGs ([[phab:T249047|T249047]]; part 2/2) (duration: 01m 06s)
* 21:59 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
* 18:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|899c175}}: Update project icons to refreshed SVGs ([[phab:T249047|T249047]]; part 1/2) (duration: 01m 08s)
* 21:51 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|54bd2f1}}: Add the investigate right to the checkuser group on testwiki ([[phab:T251932|T251932]]) (duration: 01m 08s)
* 21:49 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
* 17:50 bsitzmann@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS bullseye
* 17:46 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
* 17:44 bsitzmann@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 20:26 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
* 17:44 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: (no justification provided) (duration: 05m 31s)
* 20:05 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
* 17:38 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: (no justification provided)
* 20:02 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
* 17:18 ejegg: updated payments-wiki from {{Gerrit|afb84cc391}} to {{Gerrit|dabba1804c}}
* 19:38 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
* 16:46 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption (duration: 01m 05s)
* 19:38 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
* 16:45 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption
* 19:32 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
* 16:42 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
* 16:36 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp404.ulsfo.wmnet
* 16:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS bullseye
* 16:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:02 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
* 16:29 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 18:57 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
* 16:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
* 16:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:37 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4041.ulsfo.wmnet with OS bullseye
* 16:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
* 16:26 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic (duration: 01m 45s)
* 18:24 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4048.ulsfo.wmnet
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:14 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS bullseye
* 16:24 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic
* 17:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
* 16:23 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic (duration: 00m 24s)
* 17:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
* 16:23 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic
* 17:38 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided) (duration: 00m 14s)
* 15:59 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:38 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided)
* 15:51 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:28 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
* 15:36 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Collection/includes/Specials/SpecialCollection.php: [[phab:T251460|T251460]] Set skin on BaseTemplates if you are using getSkin (duration: 01m 08s)
* 17:28 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4048.ulsfo.wmnet with OS bullseye
* 15:28 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 17:15 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
* 15:27 vgutierrez: rolling restart of ats-tls on text@esams - [[phab:T249335|T249335]]
* 15:50 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 04s)
* 15:26 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:50 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
* 15:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=ats-be
* 15:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=cdn
* 15:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS bullseye
* 15:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=ats-be
* 15:12 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=cdn
* 15:09 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2027.codfw.wmnet with OS bullseye
* 15:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
* 14:59 moritzm: imported component/facter3 for stretch-wikimedia into "main"
* 15:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
* 14:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
* 14:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
* 14:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:55 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 14:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:55 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
* 14:50 moritzm: imported component/puppet5 for stretch-wikimedia into "main"
* 14:46 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 14:49 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 14:45 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 14:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 14:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:41 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 14:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:40 moritzm: installing install3002 [[phab:T327867|T327867]]
* 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:39 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 14:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:27 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
* 14:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:27 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:26 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 14:17 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:22 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
* 14:07 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
* 14:06 moritzm: imported component/facter3 for jessie-wikimedia into "main"
* 14:20 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:20 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 13:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 13:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 13:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
* 13:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:46 moritzm: installing install5002 [[phab:T327867|T327867]]
* 13:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:08 moritzm: installing install6002 [[phab:T327867|T327867]]
* 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 12:47 hashar: gerrit1001 running Puppet to deploy https://gerrit.wikimedia.org/r/883965 and restarting Apache 2 to change the `Listen` statements # [[phab:T326125|T326125]]
* 13:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:42 hashar: Rebooting gerrit2002
* 13:04 jynus: disabling puppet on all db hosts to control deployment of new paging alert [[phab:T172489|T172489]]
* 12:38 hashar: Stopped Puppet on gerrit1001 to prevent auto deployment of https://gerrit.wikimedia.org/r/883965
* 13:02 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers (duration: 02m 43s)
* 12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
* 13:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 12:59 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers
* 12:23 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:03 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided) (duration: 00m 15s)
* 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided)
* 12:43 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI (duration: 16m 20s)
* 12:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 12:27 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI
* 12:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138915
* 12:13 addshore@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Wikibase: [[gerrit:594920]] [[phab:T252079|T252079]] Revert "Move prefetching-term-lookup-callback service wiring" (duration: 01m 12s)
* 12:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 12:12 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138915
* 12:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9318
* 11:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9318
* 11:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55821
* 11:33 moritzm: imported component/puppet5 for jessie-wikimedia into "main"
* 11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55821
* 11:31 jbond42: enable ferm-status script https://gerrit.wikimedia.org/r/c/operations/puppet/+/576102
* 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398143
* 11:10 matthiasmullie: EU swat done
* 11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398143
* 11:07 mlitn@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikibaseMediaInfo/: [MediaInfo] Add dummy concept chips without thumbnail (duration: 01m 09s)
* 11:57 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
* 10:07 moritzm: installing Java security updates on restbase/sessionstore
* 11:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26077
* 09:11 elukey: roll restart cassandra on aqs1005 to pick up new openjdk upgrades (canary)
* 11:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26077
* 08:32 moritzm: upgrading restbase-dev to latest OpenJDK security update
* 11:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 50266
* 08:06 jynus: setting pc2007, pc2009 as read-write
* 11:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 50266
* 07:44 godog: further decrease weight for ms-be10[678] - [[phab:T252008|T252008]]
* 11:54 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14593
* 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14593
* 05:33 elukey: restart hadoop yarn nodemanager on analytics1071
* 11:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56898
* 05:22 marostegui: Reimage db2078
* 11:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56898
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only=off for maintenance [[phab:T251158|T251158]]', diff saved to https://phabricator.wikimedia.org/P11167 and previous config saved to /var/cache/conftool/dbconfig/20200507-050419-marostegui.json
* 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8368
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only for maintenance [[phab:T251158|T251158]]', diff saved to https://phabricator.wikimedia.org/P11166 and previous config saved to /var/cache/conftool/dbconfig/20200507-050046-marostegui.json
* 11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8368
* 02:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.30 for [[phab:T252079|T252079]]
* 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8560
* 02:55 brennen: reverting group1 to 1.35.0-wmf.30 for [[phab:T252079|T252079]]
* 11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8560
* 00:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34309
* 11:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34309
* 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12033
* 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12033
* 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62537
* 11:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62537
* 11:41 XioNoX: restart keyholder on deploy1002
* 11:41 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 11:40 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 11:38 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 11:36 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 11:27 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:26 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 56s)
* 11:25 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
* 11:25 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
* 11:24 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
* 11:24 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:15 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
* 11:15 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
* 11:15 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
* 11:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
* 11:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
* 11:13 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
* 11:12 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
* 11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
* 11:11 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp1001.wikimedia.org
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:09 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:08 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 11:08 aborrero@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:05 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
* 11:04 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: apply on main
* 11:04 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 11:03 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 11:01 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: apply on main
* 11:01 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 10:53 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ldap-corp1001.wikimedia.org
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
* 10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
* 10:38 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
* 10:37 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
* 10:37 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:26 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp2001.wikimedia.org
* 10:23 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp2001.wikimedia.org
* 09:40 moritzm: disabling old bastions bast3005/bast4003/bast5002/bast6001, use bast3006/bast4004/bast5003/bast6002 instead
* 08:23 marostegui: Apply schema change on labtestwiki (clouddb2002-dev)[[phab:T328086|T328086]]
* 08:22 marostegui: Apply schema change on db1106 (s1 enwiki) [[phab:T328086|T328086]]
* 08:06 elukey: restart kube-apiserver on ml-staging-ctrl2* nodes as attempt to mitigate some LIST API high latency
* 07:41 elukey: restart kube-apiserver on ml-serve-ctrl2* nodes as attempt to mitigate some 504 API response errors
* 01:15 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
* 01:11 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet
* 01:10 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4047.ulsfo.wmnet with OS bullseye
* 00:56 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
* 00:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
* 00:45 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
* 00:33 zabe@deploy1002: Finished scap: Backport for [[gerrit:884137{{!}}Stop setting cul_actor migration var (T233004)]] (duration: 07m 36s)
* 00:27 zabe@deploy1002: zabe: Backport for [[gerrit:884137{{!}}Stop setting cul_actor migration var (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 00:26 zabe@deploy1002: Started scap: Backport for [[gerrit:884137{{!}}Stop setting cul_actor migration var (T233004)]]
* 00:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
* 00:24 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
* 00:16 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
* 00:15 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
* 00:11 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
* 00:10 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye


== 2020-05-06 ==
== 2023-01-26 ==
* 23:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable GrowthExperiments guidance on testwiki (duration: 01m 07s)
* 23:59 zabe@deploy1002: Finished scap: Backport for [[gerrit:883724{{!}}Add a project logo on gorwiktionary (T327987)]] (duration: 34m 42s)
* 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable password-reset-update on Wikipedias ([[phab:T245791|T245791]]) (duration: 01m 07s)
* 23:54 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
* 22:22 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/includes/revisionlist/RevisionItem.php: [[gerrit:594803{{!}}RevisionItem: Fix providing timestamp in getRevisionLink ]] (duration: 01m 09s)
* 23:52 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4039.ulsfo.wmnet
* 21:45 andrewbogott: updating puppet compiler facts
* 23:51 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS bullseye
* 21:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
* 21:05 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 23:26 zabe@deploy1002: zabe and superpes: Backport for [[gerrit:883724{{!}}Add a project logo on gorwiktionary (T327987)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 21:04 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:25 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
* 20:35 ejegg: updated Fundraising CiviCRM from {{Gerrit|b15b2cfbb5}} to {{Gerrit|cfb6101e39}}
* 23:24 zabe@deploy1002: Started scap: Backport for [[gerrit:883724{{!}}Add a project logo on gorwiktionary (T327987)]]
* 19:08 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.31 (duration: 01m 08s)
* 23:13 sbassett@deploy1002: Synchronized private/PrivateSettings.php: [[phab:T326691|T326691]] - remove mitigation and monitor (duration: 06m 52s)
* 19:07 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.31
* 23:04 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
* 19:03 brennen: CORRECTION: 1.35.0-wmf.31 train unblocked ([[phab:T249963|T249963]]), rolling forward to group1
* 23:04 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
* 19:03 brennen: 1.35.0-wmf.31 train unblocked ([[phab:T249963|T249963]]), rolling forward to group0
* 23:03 zabe@deploy1002: Finished scap: Backport for [[gerrit:881390{{!}}Pin CheckUserEventTablesMigrationStage to read and write old (T324907)]] (duration: 08m 36s)
* 18:58 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594778/ fixes UBN [[phab:T252052|T252052]] (duration: 01m 09s)
* 22:56 zabe@deploy1002: dreamyjazz and zabe: Backport for [[gerrit:881390{{!}}Pin CheckUserEventTablesMigrationStage to read and write old (T324907)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:54 volans: upgraded spicerack to spicerack_0.0.34-1_amd64.deb on cumin[12]001
* 22:54 zabe@deploy1002: Started scap: Backport for [[gerrit:881390{{!}}Pin CheckUserEventTablesMigrationStage to read and write old (T324907)]]
* 18:45 volans: uploaded spicerack_0.0.34-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 22:45 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
* 18:44 volans@deploy1001: Finished deploy [homer/deploy@8224f0a]: Release v0.2.2 (duration: 00m 18s)
* 22:44 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
* 18:44 volans@deploy1001: Started deploy [homer/deploy@8224f0a]: Release v0.2.2
* 22:44 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS bullseye
* 18:28 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594768/ fixes [[phab:T252043|T252043]] (duration: 01m 08s)
* 22:23 zabe: running migrateRevisionCommentTemp.php in cebwiki in screen with --sleep 2 # [[phab:T275246|T275246]]
* 17:34 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 22:22 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
* 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 22:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
* 17:12 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:58 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
* 17:06 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:47 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:884055{{!}}Increase threshold for table of contents collapsing (T328045)]], [[gerrit:879664{{!}}Remove redundant block for search descriptions (T324859)]] (duration: 08m 49s)
* 17:05 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:40 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for [[gerrit:884055{{!}}Increase threshold for table of contents collapsing (T328045)]], [[gerrit:879664{{!}}Remove redundant block for search descriptions (T324859)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 16:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:39 thcipriani@deploy1002: Started scap: Backport for [[gerrit:884055{{!}}Increase threshold for table of contents collapsing (T328045)]], [[gerrit:879664{{!}}Remove redundant block for search descriptions (T324859)]]
* 15:41 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:36 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:884013{{!}}ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704)]] (duration: 08m 43s)
* 15:27 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:35 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
* 13:36 mutante: puppetmaster - revoking cert for webserver-misc-apps , recreating it with static-codereview.wikimedia.org as addiitonal SAN ([[phab:T243056|T243056]])
* 21:34 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
* 13:32 hashar: Restarting CI Jenkins
* 21:33 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
* 13:27 mutante: puppetmaster - revoking cert for webserver-misc-static, not used anymore, merged into webserver-misc-apps
* 21:33 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
* 13:27 moritzm: installing graphicsmagick security updates
* 21:33 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS bullseye
* 13:26 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki2001 - [[phab:T252010|T252010]]
* 21:29 thcipriani@deploy1002: matmarex and thcipriani: Backport for [[gerrit:884013{{!}}ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:25 XioNoX: add routinator 3000 0.7.0 to buster-wikimedia - [[phab:T252010|T252010]]
* 21:27 thcipriani@deploy1002: Started scap: Backport for [[gerrit:884013{{!}}ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704)]]
* 13:19 ema: cp: upgrade purged to v0.10
* 21:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
* 13:08 godog: start swift decom ms-be101[678] - [[phab:T252008|T252008]]
* 21:25 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
* 11:22 kart_: EU SWAT done.
* 21:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
* 11:13 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}594668{{!}}Enable ContentTranslation in Armenian WP as a default tool (T249229)]] (duration: 01m 08s)
* 21:20 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:883952{{!}}Enable write new for CheckUserLog comment fields everywhere (T233004)]] (duration: 11m 18s)
* 10:27 ema: cp2027: test purged v0.10
* 21:11 thcipriani@deploy1002: thcipriani and dreamyjazz: Backport for [[gerrit:883952{{!}}Enable write new for CheckUserLog comment fields everywhere (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 10:20 moritzm: restarting apache on dbmonitor/grafana/miscweb/graphite/netmon to pick up openldap update
* 21:09 thcipriani@deploy1002: Started scap: Backport for [[gerrit:883952{{!}}Enable write new for CheckUserLog comment fields everywhere (T233004)]]
* 10:00 moritzm: installing remaining openldap security updates (client-side libs, tools)
* 21:01 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
* 09:52 jbond42: enable rember me feature of CAS
* 20:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 and remove db1103:3314 from vslow in s4', diff saved to https://phabricator.wikimedia.org/P11159 and previous config saved to /var/cache/conftool/dbconfig/20200506-093940-marostegui.json
* 20:36 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
* 09:12 marostegui: Upgrade package on s3 and s7 master (db1123 and db1086) in preparation for tomorrow's restart - [[phab:T251158|T251158]]
* 20:13 ryankemper: `ryankemper@thanos-fe1001:~$ sudo run-puppet-agent` following merge of wdqs recording rule patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/883610
* 08:56 jbond42: restarting ps1-a4-eqiad.mgmt.eqiad.wmnet.
* 20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
* 08:53 jynus: kill FTWRL on db2101
* 20:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
* 08:43 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Reverting change on mw1407 [[phab:T99740|T99740]] (duration: 01m 16s)
* 20:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
* 08:02 _joe_: restarted php-fpm with tweaked parameters on mw1407, now briefly pooling for traffic ([[phab:T99740|T99740]])
* 19:56 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4038.ulsfo.wmnet with OS bullseye
* 07:38 kormat@cumin1001: dbctl commit (dc=all): 'Set es1023 (es5 master) to 0 weight after reimaging es1024 [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11158 and previous config saved to /var/cache/conftool/dbconfig/20200506-073856-kormat.json
* 19:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
* 07:32 vgutierrez: downgrade to ATS 8.0.7-1wm3 on cp4026, cp4031, cp5006 and cp5011
* 19:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
* 06:00 elukey: powercycle analytics1060 - host stuck - [[phab:T251973|T251973]]
* 19:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1103:3314 in vslow on s4 while db1121 is out [[phab:T250055|T250055]]', diff saved to https://phabricator.wikimedia.org/P11157 and previous config saved to /var/cache/conftool/dbconfig/20200506-050340-marostegui.json
* 19:00 brennen: 1.40.0-wmf.20 train ([[phab:T325583|T325583]]): no current blockers, rolling to all wikis.
* 05:02 marostegui: Deploy schema change on db1121
* 18:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
* 18:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet
* 18:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS bullseye
* 18:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
* 18:17 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
* 18:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 18:16 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 18:16 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 18:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 18:15 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 18:15 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 18:15 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 18:15 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 18:15 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 18:15 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 18:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 18:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 18:14 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 18:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 18:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 18:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 18:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 18:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 18:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 18:10 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 18:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 18:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 17:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS bullseye
* 17:55 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
* 17:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS bullseye
* 17:30 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1015.eqiad.wmnet
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43427 and previous config saved to /var/cache/conftool/dbconfig/20230126-172806-root.json
* 17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 17:24 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
* 17:24 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 17:22 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1014.eqiad.wmnet
* 17:19 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s)
* 17:19 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
* 17:16 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43426 and previous config saved to /var/cache/conftool/dbconfig/20230126-171302-root.json
* 17:12 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet
* 17:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
* 17:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
* 17:06 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet
* 17:06 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS bullseye
* 17:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
* 17:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 17:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 17:04 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6007.drmrs.wmnet
* 17:03 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS bullseye
* 17:02 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet
* 16:59 cgoubert@deploy1002: Synchronized tox.ini: Rebuilding mediawiki-webserver (duration: 07m 19s)
* 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43425 and previous config saved to /var/cache/conftool/dbconfig/20230126-165757-root.json
* 16:56 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet
* 16:53 claime: Running scap sync-file -D php_fpm_restart_script:/bin/true tox.ini "Rebuilding mediawiki-webserver image" - [[phab:T326794|T326794]]
* 16:51 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 16:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2027']
* 16:48 sukhe: correcting earlier log: pooling lvs2007 after [[phab:T326564|T326564]]
* 16:48 sukhe: pooling lvs2009 after [[phab:T326564|T326564]]
* 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43424 and previous config saved to /var/cache/conftool/dbconfig/20230126-164252-root.json
* 16:41 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
* 16:41 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2027']
* 16:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
* 16:38 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
* 16:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
* 16:31 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
* 16:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 16:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
* 16:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
* 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43423 and previous config saved to /var/cache/conftool/dbconfig/20230126-162747-root.json
* 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 16:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
* 16:24 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001-dev
* 16:23 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001-dev
* 16:23 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
* 16:21 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet
* 16:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 16:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 16:20 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 16:19 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
* 16:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 16:19 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 16:18 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS bullseye
* 16:14 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet
* 16:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: [[phab:T323717|T323717]]
* 16:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: [[phab:T323717|T323717]]
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43422 and previous config saved to /var/cache/conftool/dbconfig/20230126-161242-root.json
* 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 [[phab:T328024|T328024]]', diff saved to https://phabricator.wikimedia.org/P43421 and previous config saved to /var/cache/conftool/dbconfig/20230126-161137-root.json
* 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary [[phab:T328024|T328024]]', diff saved to https://phabricator.wikimedia.org/P43420 and previous config saved to /var/cache/conftool/dbconfig/20230126-161058-marostegui.json
* 16:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - [[phab:T328024|T328024]]
* 16:09 moritzm: installing distro-info-data updates from Bullseye point release
* 16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudgw2001-dev.codfw.wmnet
* 16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 16:06 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 16:05 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1009.eqiad.wmnet
* 15:55 jbond: enable-puppet post deploy requestctl ferm chage gerrit:883935
* 15:55 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 15:51 hashar: Restarting CI Jenkins for upgrade
* 15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s8 [[phab:T328024|T328024]]
* 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 [[phab:T328024|T328024]]', diff saved to https://phabricator.wikimedia.org/P43419 and previous config saved to /var/cache/conftool/dbconfig/20230126-155000-root.json
* 15:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s8 [[phab:T328024|T328024]]
* 15:49 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudgw2001-dev.codfw.wmnet
* 15:46 hashar: Restart Jenkins for upgrade
* 15:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1009.eqiad.wmnet
* 15:30 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
* 15:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 15:30 sukhe: install2003: rm /etc/dhcp/automation/ttyS1-115200/cp2027.conf
* 15:29 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
* 15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 15:27 sukhe: poweroff lvs2007: [[phab:T326564|T326564]]
* 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43418 and previous config saved to /var/cache/conftool/dbconfig/20230126-152329-root.json
* 15:12 jbond: disabl-puppet deplot requestctl ferm chage gerrit:883935
* 15:09 sukhe: stop pybal on lvs2007: [[phab:T326564|T326564]]
* 15:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for [[phab:T326564|T326564]]
* 15:09 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for [[phab:T326564|T326564]]
* 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43417 and previous config saved to /var/cache/conftool/dbconfig/20230126-150824-root.json
* 15:04 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
* 15:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 15:02 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
* 15:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 14:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43415 and previous config saved to /var/cache/conftool/dbconfig/20230126-145319-root.json
* 14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
* 14:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 14:39 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43414 and previous config saved to /var/cache/conftool/dbconfig/20230126-143814-root.json
* 14:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
* 14:37 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 14:36 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
* 14:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 14:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:31 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 14:31 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiadmin password ([[phab:T326802|T326802]]) (duration: 07m 04s)
* 14:27 moritzm: installing containerd security updates
* 14:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43413 and previous config saved to /var/cache/conftool/dbconfig/20230126-142309-root.json
* 14:16 Lucas_WMDE: UTC afternoon backport+config window done
* 14:15 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:883122{{!}}Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004)]] (duration: 09m 16s)
* 14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:11 jbond: disable puppet fleet wide to role out etcd ferm change gerrit:883888
* 14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:09 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43412 and previous config saved to /var/cache/conftool/dbconfig/20230126-140804-root.json
* 14:07 lucaswerkmeister-wmde@deploy1002: dreamyjazz and lucaswerkmeister-wmde: Backport for [[gerrit:883122{{!}}Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 [[phab:T328023|T328023]]', diff saved to https://phabricator.wikimedia.org/P43411 and previous config saved to /var/cache/conftool/dbconfig/20230126-140716-root.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 primary [[phab:T328023|T328023]]', diff saved to https://phabricator.wikimedia.org/P43410 and previous config saved to /var/cache/conftool/dbconfig/20230126-140630-root.json
* 14:06 marostegui: Starting s5 codfw failover from db2123 to db2113 - [[phab:T328023|T328023]]
* 14:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:883122{{!}}Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004)]]
* 14:00 moritzm: restarting etherpad-lite to pick up nodejs security update
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Remove vslow from db2113, future s5 codfw master [[phab:T328023|T328023]]', diff saved to https://phabricator.wikimedia.org/P43409 and previous config saved to /var/cache/conftool/dbconfig/20230126-135509-marostegui.json
* 13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T328023|T328023]]
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 [[phab:T328023|T328023]]', diff saved to https://phabricator.wikimedia.org/P43408 and previous config saved to /var/cache/conftool/dbconfig/20230126-135215-root.json
* 13:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T328023|T328023]]
* 13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:44 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
* 13:37 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
* 13:32 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:883723{{!}}Change time zone setting on gorwiktionary (T327986)]] (duration: 12m 02s)
* 13:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:25 moritzm: restarting turnilo for nodejs security update
* 13:22 ladsgroup@deploy1002: superpes and ladsgroup: Backport for [[gerrit:883723{{!}}Change time zone setting on gorwiktionary (T327986)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 13:20 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:883723{{!}}Change time zone setting on gorwiktionary (T327986)]]
* 13:10 moritzm: installing nodejs security updates on bullseye
* 13:09 hashar: Rebooting gerrit2002.wikimedia.org host to validate Apache 2 services starts AFTER network went online {{!}} [[phab:T326125|T326125]]
* 13:06 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:04 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
* 12:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
* 12:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp3051.esams.wmnet with reason: [[phab:T323717|T323717]]
* 12:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp3051.esams.wmnet with reason: [[phab:T323717|T323717]]
* 12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=ats-be
* 12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=cdn
* 12:41 sukhe: depool cp3051.esams.wmnet for firmware update testing: [[phab:T323717|T323717]]
* 12:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
* 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
* 12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
* 12:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
* 12:10 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 12:10 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
* 12:03 jbond: enable profile::base::firewall::defs_from_etcd: true globally
* 11:56 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd-client-ssl._tcp.wikimedia.org on all recursors
* 11:56 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd-client-ssl._tcp.wikimedia.org on all recursors
* 11:49 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 11:49 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
* 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flowspec1001
* 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
* 11:46 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
* 11:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:40 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts flowspec1001
* 11:36 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux
* 11:29 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
* 11:29 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
* 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43405 and previous config saved to /var/cache/conftool/dbconfig/20230126-110822-root.json
* 11:03 hashar: Restarted Apache 2 on gerrit.wikimedia.org
* 10:55 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
* 10:54 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 10:54 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43404 and previous config saved to /var/cache/conftool/dbconfig/20230126-105317-root.json
* 10:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 10:46 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
* 10:45 moritzm: installing postgresql-13 security updates
* 10:43 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
* 10:42 joal@deploy1002: Finished deploy [airflow-dags/analytics@e52205b]: (no justification provided) (duration: 00m 11s)
* 10:42 joal@deploy1002: Started deploy [airflow-dags/analytics@e52205b]: (no justification provided)
* 10:41 claime: cgoubert@authdns1001:~$ sudo -i authdns-update
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43403 and previous config saved to /var/cache/conftool/dbconfig/20230126-103812-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43402 and previous config saved to /var/cache/conftool/dbconfig/20230126-103448-root.json
* 10:32 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] (duration: 01m 16s)
* 10:31 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435]
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43401 and previous config saved to /var/cache/conftool/dbconfig/20230126-102307-root.json
* 10:21 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] (duration: 00m 04s)
* 10:21 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435]
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43400 and previous config saved to /var/cache/conftool/dbconfig/20230126-101943-root.json
* 10:08 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 10:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43399 and previous config saved to /var/cache/conftool/dbconfig/20230126-100802-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43398 and previous config saved to /var/cache/conftool/dbconfig/20230126-100438-root.json
* 09:59 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 09:58 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435] (duration: 01m 08s)
* 09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435]
* 09:57 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435] (duration: 00m 05s)
* 09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435]
* 09:56 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435] (duration: 07m 00s)
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43397 and previous config saved to /var/cache/conftool/dbconfig/20230126-095257-root.json
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43396 and previous config saved to /var/cache/conftool/dbconfig/20230126-095205-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43395 and previous config saved to /var/cache/conftool/dbconfig/20230126-094933-root.json
* 09:49 joal@deploy1002: Started deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435]
* 09:48 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:48 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:47 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:47 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43394 and previous config saved to /var/cache/conftool/dbconfig/20230126-093700-root.json
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43393 and previous config saved to /var/cache/conftool/dbconfig/20230126-093620-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43392 and previous config saved to /var/cache/conftool/dbconfig/20230126-093428-root.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43391 and previous config saved to /var/cache/conftool/dbconfig/20230126-093303-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2144 to x2 primary [[phab:T313811|T313811]]', diff saved to https://phabricator.wikimedia.org/P43390 and previous config saved to /var/cache/conftool/dbconfig/20230126-092512-root.json
* 09:24 marostegui: Starting x2 codfw failover from db2142 to db2144 - [[phab:T328001|T328001]]
* 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 [[phab:T328001|T328001]]
* 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 [[phab:T328001|T328001]]
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43389 and previous config saved to /var/cache/conftool/dbconfig/20230126-092155-root.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43388 and previous config saved to /var/cache/conftool/dbconfig/20230126-092115-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43387 and previous config saved to /var/cache/conftool/dbconfig/20230126-091923-root.json
* 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 [[phab:T328001|T328001]]
* 09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 [[phab:T328001|T328001]]
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43386 and previous config saved to /var/cache/conftool/dbconfig/20230126-091758-root.json
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43385 and previous config saved to /var/cache/conftool/dbconfig/20230126-090650-root.json
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43384 and previous config saved to /var/cache/conftool/dbconfig/20230126-090610-root.json
* 09:05 phedenskog@deploy1002: Finished deploy [performance/navtiming@e5fdd6e]: (no justification provided) (duration: 00m 06s)
* 09:05 phedenskog@deploy1002: Started deploy [performance/navtiming@e5fdd6e]: (no justification provided)
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43383 and previous config saved to /var/cache/conftool/dbconfig/20230126-090418-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 [[phab:T328000|T328000]]', diff saved to https://phabricator.wikimedia.org/P43382 and previous config saved to /var/cache/conftool/dbconfig/20230126-090302-root.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43381 and previous config saved to /var/cache/conftool/dbconfig/20230126-090253-root.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary [[phab:T328000|T328000]]', diff saved to https://phabricator.wikimedia.org/P43380 and previous config saved to /var/cache/conftool/dbconfig/20230126-090212-root.json
* 09:02 marostegui: Starting s7 codfw failover from db2121 to db2118 - [[phab:T328000|T328000]]
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43379 and previous config saved to /var/cache/conftool/dbconfig/20230126-085145-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43378 and previous config saved to /var/cache/conftool/dbconfig/20230126-085105-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43377 and previous config saved to /var/cache/conftool/dbconfig/20230126-084748-root.json
* 08:44 moritzm: added Eoghan to pwstore
* 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T328000|T328000]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 [[phab:T328000|T328000]]', diff saved to https://phabricator.wikimedia.org/P43376 and previous config saved to /var/cache/conftool/dbconfig/20230126-084112-root.json
* 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T328000|T328000]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43375 and previous config saved to /var/cache/conftool/dbconfig/20230126-083640-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43374 and previous config saved to /var/cache/conftool/dbconfig/20230126-083600-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 [[phab:T327999|T327999]]', diff saved to https://phabricator.wikimedia.org/P43373 and previous config saved to /var/cache/conftool/dbconfig/20230126-083543-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 primary [[phab:T327999|T327999]]', diff saved to https://phabricator.wikimedia.org/P43372 and previous config saved to /var/cache/conftool/dbconfig/20230126-083459-root.json
* 08:34 marostegui: Starting s3 codfw failover from db2105 to db2127 - [[phab:T327999|T327999]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43371 and previous config saved to /var/cache/conftool/dbconfig/20230126-083243-root.json
* 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 [[phab:T327999|T327999]]
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 [[phab:T327999|T327999]]', diff saved to https://phabricator.wikimedia.org/P43370 and previous config saved to /var/cache/conftool/dbconfig/20230126-082432-root.json
* 08:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 [[phab:T327999|T327999]]
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43369 and previous config saved to /var/cache/conftool/dbconfig/20230126-082055-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43368 and previous config saved to /var/cache/conftool/dbconfig/20230126-082038-root.json
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T327998|T327998]]', diff saved to https://phabricator.wikimedia.org/P43367 and previous config saved to /var/cache/conftool/dbconfig/20230126-081916-root.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2107 to s2 primary [[phab:T327998|T327998]]', diff saved to https://phabricator.wikimedia.org/P43366 and previous config saved to /var/cache/conftool/dbconfig/20230126-081818-root.json
* 08:17 marostegui: Starting s2 codfw failover from db2104 to db2107 - [[phab:T327998|T327998]]
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43365 and previous config saved to /var/cache/conftool/dbconfig/20230126-081738-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43364 and previous config saved to /var/cache/conftool/dbconfig/20230126-080533-root.json
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 [[phab:T327998|T327998]]
* 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 [[phab:T327998|T327998]]
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2107 with weight 0 [[phab:T327998|T327998]]', diff saved to https://phabricator.wikimedia.org/P43363 and previous config saved to /var/cache/conftool/dbconfig/20230126-080427-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43362 and previous config saved to /var/cache/conftool/dbconfig/20230126-080233-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 [[phab:T327997|T327997]]', diff saved to https://phabricator.wikimedia.org/P43361 and previous config saved to /var/cache/conftool/dbconfig/20230126-080159-root.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary [[phab:T327997|T327997]]', diff saved to https://phabricator.wikimedia.org/P43360 and previous config saved to /var/cache/conftool/dbconfig/20230126-080033-root.json
* 08:00 marostegui: Starting s1 codfw failover from db2103 to db2112 - [[phab:T327997|T327997]]
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43359 and previous config saved to /var/cache/conftool/dbconfig/20230126-075028-root.json
* 07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2012.*
* 07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2011.*
* 07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2010.*
* 07:48 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2009.*
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 [[phab:T327997|T327997]]', diff saved to https://phabricator.wikimedia.org/P43358 and previous config saved to /var/cache/conftool/dbconfig/20230126-073616-root.json
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s1 [[phab:T327997|T327997]]
* 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s1 [[phab:T327997|T327997]]
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43357 and previous config saved to /var/cache/conftool/dbconfig/20230126-073523-root.json
* 07:25 marostegui@deploy1002: Finished scap: Backport for [[gerrit:883699{{!}}ProductionServices.php: Depool pc2011 (T327925)]] (duration: 11m 19s)
* 07:25 dcausse: [[phab:T322869|T322869]]: depooling wdqs2009 wdqs2010 wdqs2011 wdqs2012 these hosts should not serve user traffic yet they don't have the database loaded
* 07:23 marostegui: Failover m1 from db1195 to db1176 - [[phab:T327800|T327800]]
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43356 and previous config saved to /var/cache/conftool/dbconfig/20230126-072017-root.json
* 07:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
* 07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
* 07:17 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
* 07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
* 07:16 marostegui@deploy1002: marostegui: Backport for [[gerrit:883699{{!}}ProductionServices.php: Depool pc2011 (T327925)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:14 marostegui@deploy1002: Started scap: Backport for [[gerrit:883699{{!}}ProductionServices.php: Depool pc2011 (T327925)]]
* 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 [[phab:T327800|T327800]]
* 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 [[phab:T327800|T327800]]
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43354 and previous config saved to /var/cache/conftool/dbconfig/20230126-070512-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db1103', diff saved to https://phabricator.wikimedia.org/P43353 and previous config saved to /var/cache/conftool/dbconfig/20230126-070220-marostegui.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 [[phab:T327861|T327861]]', diff saved to https://phabricator.wikimedia.org/P43352 and previous config saved to /var/cache/conftool/dbconfig/20230126-070158-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write [[phab:T327861|T327861]]', diff saved to https://phabricator.wikimedia.org/P43351 and previous config saved to /var/cache/conftool/dbconfig/20230126-070035-marostegui.json
* 07:00 marostegui: Starting x1 eqiad failover from db1120 to db1103 - [[phab:T327861|T327861]]
* 06:48 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6015.drmrs.wmnet
* 06:48 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS bullseye
* 06:32 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiuser password ([[phab:T326802|T326802]]) (duration: 07m 23s)
* 06:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
* 06:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 with weight 0 [[phab:T327861|T327861]]', diff saved to https://phabricator.wikimedia.org/P43350 and previous config saved to /var/cache/conftool/dbconfig/20230126-061751-root.json
* 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T327861|T327861]]
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T327861|T327861]]
* 05:57 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS bullseye
* 05:53 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6006.drmrs.wmnet
* 05:53 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS bullseye
* 05:32 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
* 05:28 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
* 05:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS bullseye
* 05:09 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6014.drmrs.wmnet
* 05:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS bullseye
* 04:45 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
* 04:42 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
* 04:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS bullseye
* 04:22 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6005.drmrs.wmnet
* 04:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS bullseye
* 03:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
* 03:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
* 03:29 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS bullseye
* 03:27 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6013.drmrs.wmnet
* 03:27 ejegg: payments-wiki upgraded from {{Gerrit|08b8c3bc}} to {{Gerrit|82d89841}}
* 03:26 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS bullseye
* 03:04 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
* 03:01 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
* 02:41 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS bullseye
* 02:30 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
* 02:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 02:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
* 01:58 ejegg: restarted fundraising scheduled jobs after queue server reboot
* 01:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
* 01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=ats-be
* 01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=cdn
* 01:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
* 01:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
* 01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=ats-be
* 01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=cdn
* 01:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2028.codfw.wmnet with OS bullseye
* 01:24 ejegg: payments-wiki upgraded from {{Gerrit|15395d05}} to {{Gerrit|08b8c3bc}} (upgraded from MW 1.35 to MW 1.39)
* 01:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
* 01:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
* 01:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
* 01:14 ejegg: disabled fundraising scheduled jobs for queue server reboot
* 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2028.codfw.wmnet with OS bullseye
* 01:03 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
* 01:00 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
* 01:00 ejegg: turned pending transaction resolvers back on after civi deploy
* 00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
* 00:50 ejegg: civicrm upgraded from {{Gerrit|3e6b21b6}} to {{Gerrit|b5d6a790}}
* 00:50 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 00:49 sukhe: depool cp2028 for testing firmware update cookbook: [[phab:T321309|T321309]]
* 00:49 ejegg: disabled pending transaction resolvers for civi deploy
* 00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=ats-be
* 00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=cdn


== 2020-05-05 ==
== 2023-01-25 ==
* 23:44 catrope@deploy1001: Synchronized wmf-config/flaggedrevs.php: Restore the reviewer group on fawiki ([[phab:T249643|T249643]]) (duration: 01m 06s)
* 23:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6004.drmrs.wmnet
* 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3) (duration: 00m 11s)
* 23:57 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS bullseye
* 23:22 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3)
* 23:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
* 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 14s)
* 23:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
* 23:21 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
* 23:29 zabe@deploy1002: Finished scap: (no justification provided) (duration: 07m 34s)
* 23:21 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 20s)
* 23:21 zabe@deploy1002: Started scap: (no justification provided)
* 23:20 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
* 23:20 zabe@deploy1002: Backport cancelled.
* 22:00 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: [[phab:T251952|T251952]] take 2 (duration: 01m 06s)
* 23:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS bullseye
* 21:57 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: [[phab:T251952|T251952]] (duration: 01m 05s)
* 23:13 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6012.drmrs.wmnet
* 21:55 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/SpecialNewpages.php: [[phab:T251950|T251950]] (duration: 01m 06s)
* 23:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS bullseye
* 20:02 herron: added ryankemper to wmf and ops ldap groups [[phab:T251572|T251572]]
* 22:43 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 08s)
* 22:40 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
* 19:38 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 22:21 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS bullseye
* 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 25m 18s)
* 22:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6003.drmrs.wmnet
* 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.31
* 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 19:13 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 19:12 brennen@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.31 (duration: 97m 23s)
* 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 19:02 brennen: train status: 1.35.0-wmf.31: presently pressing enter through scap-cdb-rebuild; at 8% ([[phab:T249963|T249963]], [[phab:T223287|T223287]])
* 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 18:39 cdanis: depool mw2221 for some manual testing
* 21:34 samtar@deploy1002: Finished scap: Backport for [[gerrit:883617{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]], [[gerrit:883616{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]] (duration: 09m 27s)
* 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 09s)
* 21:26 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:883617{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]], [[gerrit:883616{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]] synced to the testservers: mwdebug2002.cod
* 18:35 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 21:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 18:34 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 18m 54s)
* 21:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 18:15 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 21:24 samtar@deploy1002: Started scap: Backport for [[gerrit:883617{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]], [[gerrit:883616{{!}}Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)]]
* 17:35 brennen@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.31
* 21:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 16:48 brennen: 1.35.0-wmf.31 was branched at {{Gerrit|4d3fed31a435e7bd24925a154f89a9407670986d}} for [[phab:T249963|T249963]]
* 20:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 16:34 brennen: triggering branch cut for 1.35.0-wmf.31 ([[phab:T249963|T249963]]) via https://releases-jenkins.wikimedia.org/job/MediaWiki%20Train%20Branch%20Cut/build?delay=0sec
* 20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS bullseye
* 16:18 brennen: notice: planning branch cut for 1.35.0-wmf.31 ([[phab:T249963|T249963]]) at 16:30 UTC
* 20:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts cp2028.codfw.wmnet
* 15:47 cstone: SmashPig revision changed from {{Gerrit|8c30ed7fe5}} to {{Gerrit|cd1a49da5f}}
* 20:58 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 100% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11153 and previous config saved to /var/cache/conftool/dbconfig/20200505-153843-kormat.json
* 20:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 15:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
* 14:58 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb (duration: 01m 31s)
* 20:49 ejegg: updated employers.csv on paymentswiki
* 14:56 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb
* 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
* 14:45 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 20:33 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
* 14:43 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 20:32 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo-eqiad cluster: Reboot kafka nodes
* 14:32 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 75% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11149 and previous config saved to /var/cache/conftool/dbconfig/20200505-143158-kormat.json
* 20:30 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
* 13:46 akosiaris: deploy cxserver chart 0.0.15 to staging, codfw, eqiad. [[phab:T219921|T219921]]
* 20:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS bullseye
* 13:45 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 20:00 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6011.drmrs.wmnet
* 13:41 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 19:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS bullseye
* 13:41 hashar: Updated Jenkins job https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler to have it defined in JJB # [[phab:T97513|T97513]]
* 19:52 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
* 13:36 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 19:38 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 13:18 vgutierrez: upgrade ATS to version 8.1 () on cp4026, cp4032, cp5006 and cp5011
* 19:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 50% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11147 and previous config saved to /var/cache/conftool/dbconfig/20200505-131520-kormat.json
* 19:33 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
* 12:52 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 at 25% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11145 and previous config saved to /var/cache/conftool/dbconfig/20200505-125254-kormat.json
* 19:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 12:37 XioNoX: push pfw policy - [[phab:T251769|T251769]]
* 19:21 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 12:07 jbond42: updating cas login page
* 19:17 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]] (duration: 07m 04s)
* 12:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:12 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS bullseye
* 12:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 12:03 moritzm: rolling restart of apache on puppetboard* to pick up OpenLDAP update
* 19:06 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6002.drmrs.wmnet
* 11:47 moritzm: rolling restart of apache on kibana hosts
* 19:01 brennen: 1.40.0-wmf.20 train ([[phab:T325583|T325583]]): no blockers, rolling to group1.
* 11:41 mutante: LDAP - added eamedia to wmf group ([[phab:T251358|T251358]])
* 19:00 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11144 and previous config saved to /var/cache/conftool/dbconfig/20200505-113152-marostegui.json
* 19:00 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11143 and previous config saved to /var/cache/conftool/dbconfig/20200505-113100-marostegui.json
* 18:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS bullseye
* 11:30 marostegui: Drop [[phab:T248086|T248086]]_wb_terms table on labsdb hosts - [[phab:T248086|T248086]]
* 18:37 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
* 11:26 moritzm: rolling restart of apache/FPM on mw1261-mw1265
* 18:35 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 11:22 kart_: EU SWAT done.
* 18:34 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
* 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}592479{{!}}Adjust ContentTranslation MT threshold for Chinese WP to 70% (T246383)]] (duration: 01m 01s)
* 18:33 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 11:01 moritzm: installing remaining openldap security updates (client-side libs, tools)
* 18:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:00 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1024 for reimaging, add es1023 (master) for reading in the meantime [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11141 and previous config saved to /var/cache/conftool/dbconfig/20200505-110031-kormat.json
* 18:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11140 and previous config saved to /var/cache/conftool/dbconfig/20200505-104540-marostegui.json
* 18:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS bullseye
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11139 and previous config saved to /var/cache/conftool/dbconfig/20200505-104441-marostegui.json
* 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 10:33 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 18:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 10:23 arturo: copy prometheus-rabbitmq-exporter v0.4 from stretch-wikimedia to buster-wikimedia in apt1001 ([[phab:T251660|T251660]])
* 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 10:18 arturo: copy prometheus-pdns-exporter v0.5.1 from stretch-wikimedia to buster-wikimedia in apt1001 ([[phab:T251575|T251575]])
* 18:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 10:16 mutante: temp disabling puppet on all ganeti hosts to carefully deploy change related to rapi cert location
* 18:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6010.drmrs.wmnet
* 09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS bullseye
* 09:36 moritzm: removing boron.eqiad.wmnet
* 17:32 mutante: removing racktables.wikimedia.org from DNS - that's it for this ancient service [[phab:T327405|T327405]]
* 09:36 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
* 09:03 gehel: restarting wdqs updater on all servers
* 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
* 08:53 moritzm: installing Java security updates on releases*
* 16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS bullseye
* 08:44 kormat: reimaging es1024 to buster [[phab:T250666|T250666]]
* 16:50 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes
* 08:27 ema: cp2028 and cp2030 (both upload): varnish-fe restart to clear cache and evaluate 'exp' admission policy [[phab:T144187|T144187]] [[phab:T249809|T249809]]
* 16:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
* 08:26 moritzm: upgrading slapd on serpens/seaborgium
* 16:43 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
* 08:19 ema: cp2027 and cp2029 (both text): varnish-fe restart to clear cache and evaluate 'exp' admission policy [[phab:T144187|T144187]] [[phab:T249809|T249809]]
* 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=ats-be
* 08:08 moritzm: installing Java security updates on notebook/stat hosts
* 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=cdn
* 07:54 gehel@deploy1001: Finished deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22 (duration: 04m 18s)
* 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
* 07:50 gehel@deploy1001: Started deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22
* 16:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
* 07:36 zpapierski@deploy1001: Started deploy [wdqs/wdqs@d37a059]: fix for the duplicated jars
* 16:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
* 06:59 addshore: depool wdqs1006 heavy lag
* 16:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS bullseye
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only=off for maintenance [[phab:T251154|T251154]]', diff saved to https://phabricator.wikimedia.org/P11133 and previous config saved to /var/cache/conftool/dbconfig/20200505-052334-marostegui.json
* 16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only for maintenance [[phab:T251154|T251154]]', diff saved to https://phabricator.wikimedia.org/P11132 and previous config saved to /var/cache/conftool/dbconfig/20200505-052058-marostegui.json
* 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
* 05:19 marostegui: Start s5 and s6 maintenance - [[phab:T251154|T251154]]
* 16:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
* 04:39 marostegui: Restart mysql on tendril host: db1115 - [[phab:T231769|T231769]]
* 16:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
* 16:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
* 16:04 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
* 16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
* 15:56 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
* 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2031']
* 15:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:50 robh: db1139 ilom wins/netbios disabled and ilom reset [[phab:T327877|T327877]]
* 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
* 15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
* 15:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
* 15:45 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
* 15:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
* 15:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031.codfw.wmnet']
* 15:44 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031.codfw.wmnet']
* 15:43 robh: netbios wins disabled on db1140 ilom and ilom reset [[phab:T327877|T327877]]
* 15:43 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
* 15:38 papaul: on going maintenance on fasw-c-eqiad
* 15:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
* 15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
* 15:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
* 15:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
* 15:23 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
* 15:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
* 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
* 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be
* 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=cdn
* 15:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
* 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
* 15:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 15:12 urbanecm@deploy1002: Finished scap: triggering i18n refresh for [[phab:T327824|T327824]] (duration: 07m 57s)
* 15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
* 15:04 urbanecm@deploy1002: Started scap: triggering i18n refresh for [[phab:T327824|T327824]]
* 15:04 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:882615{{!}}Enable the Wikibase REST API on Wikidata (T324999)]] (duration: 08m 43s)
* 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be
* 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=cdn
* 15:01 urbanecm: Overrunning B&C window
* 14:57 urbanecm@deploy1002: urbanecm and migr: Backport for [[gerrit:882615{{!}}Enable the Wikibase REST API on Wikidata (T324999)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
* 14:55 urbanecm@deploy1002: Started scap: Backport for [[gerrit:882615{{!}}Enable the Wikibase REST API on Wikidata (T324999)]]
* 14:53 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
* 14:53 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:883224{{!}}REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547{{!}}User impact: amend incorrect parameter for the single day streak text (T327824)]] (duration: 32m 21s)
* 14:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
* 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
* 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6002.wikimedia.org
* 14:40 urbanecm@deploy1002: jakob and sgimeno and urbanecm: Backport for [[gerrit:883224{{!}}REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547{{!}}User impact: amend incorrect parameter for the single day streak text (T327824)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6002.wikimedia.org on all recursors
* 14:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install6002.wikimedia.org on all recursors
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
* 14:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
* 14:25 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install6002.wikimedia.org
* 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5002.wikimedia.org
* 14:21 urbanecm@deploy1002: Started scap: Backport for [[gerrit:883224{{!}}REST: Use error log level for unexpected errors (T327490)]], [[gerrit:883547{{!}}User impact: amend incorrect parameter for the single day streak text (T327824)]]
* 14:16 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:883222{{!}}Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)]] (duration: 12m 59s)
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5002.wikimedia.org on all recursors
* 14:09 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5002.wikimedia.org on all recursors
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
* 14:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
* 14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
* 14:05 urbanecm@deploy1002: aleksandar and urbanecm: Backport for [[gerrit:883222{{!}}Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5002.wikimedia.org
* 14:03 urbanecm@deploy1002: Started scap: Backport for [[gerrit:883222{{!}}Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)]]
* 13:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
* 13:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
* 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
* 13:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
* 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3002.wikimedia.org
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3002.wikimedia.org on all recursors
* 13:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3002.wikimedia.org on all recursors
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
* 13:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
* 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install3002.wikimedia.org
* 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install2004.wikimedia.org
* 13:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
* 13:04 jbond: puppet now using vendored version of augeas-core https://gerrit.wikimedia.org/r/c/operations/puppet/+/883233
* 13:04 jbond: enable puppet fleet wide to post deploy gerrit:883233
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install2004.wikimedia.org on all recursors
* 13:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install2004.wikimedia.org on all recursors
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
* 12:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
* 12:54 jbond: disable puppet fleet wide to deploy gerrit:883233
* 12:54 jnuche@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 21s)
* 12:54 jnuche@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
* 12:45 moritzm: restarting Exim on MXes to pick up new libtasn
* 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2003.codfw.wmnet
* 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2002.codfw.wmnet
* 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1003.eqiad.wmnet
* 12:42 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1002.eqiad.wmnet
* 12:41 moritzm: restarting slapd on r/w servers to pick up new libtasn
* 12:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 12:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install2004.wikimedia.org
* 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install1004.wikimedia.org
* 12:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install1004.wikimedia.org on all recursors
* 12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install1004.wikimedia.org on all recursors
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
* 12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
* 12:12 moritzm: installing libtasn security updates on buster
* 11:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install1004.wikimedia.org
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
* 11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
* 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
* 11:34 Lucas_WMDE: Updated the Wikidata property suggester with data from 20230102's JSON dump ([[phab:T325942|T325942]])
* 11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
* 11:27 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:12 hnowlan: restarting lvs on lvs1019 for thumbor healthcheck change
* 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43344 and previous config saved to /var/cache/conftool/dbconfig/20230125-111059-root.json
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43343 and previous config saved to /var/cache/conftool/dbconfig/20230125-110924-root.json
* 11:08 hnowlan: restarting lvs on lvs2009 for thumbor healthcheck change
* 11:00 hnowlan: restarting lvs on lvs1020 for thumbor healthcheck change
* 11:00 hnowlan: restarting lvs on lvs1010 for thumbor healthcheck change
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43342 and previous config saved to /var/cache/conftool/dbconfig/20230125-105554-root.json
* 10:54 hnowlan: restarting lvs on lvs2010 for thumbor healthcheck change
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43341 and previous config saved to /var/cache/conftool/dbconfig/20230125-105443-root.json
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43340 and previous config saved to /var/cache/conftool/dbconfig/20230125-105419-root.json
* 10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 10:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 10:43 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43338 and previous config saved to /var/cache/conftool/dbconfig/20230125-104049-root.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43337 and previous config saved to /var/cache/conftool/dbconfig/20230125-103938-root.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43336 and previous config saved to /var/cache/conftool/dbconfig/20230125-103914-root.json
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43335 and previous config saved to /var/cache/conftool/dbconfig/20230125-102544-root.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43334 and previous config saved to /var/cache/conftool/dbconfig/20230125-102433-root.json
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43333 and previous config saved to /var/cache/conftool/dbconfig/20230125-102409-root.json
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43332 and previous config saved to /var/cache/conftool/dbconfig/20230125-101039-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43331 and previous config saved to /var/cache/conftool/dbconfig/20230125-100928-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43330 and previous config saved to /var/cache/conftool/dbconfig/20230125-100904-root.json
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43329 and previous config saved to /var/cache/conftool/dbconfig/20230125-095534-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43328 and previous config saved to /var/cache/conftool/dbconfig/20230125-095423-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43327 and previous config saved to /var/cache/conftool/dbconfig/20230125-095400-root.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43326 and previous config saved to /var/cache/conftool/dbconfig/20230125-094029-root.json
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43325 and previous config saved to /var/cache/conftool/dbconfig/20230125-093918-root.json
* 09:30 Emperor: rolling depool & update of thanos front-ends [[phab:T327871|T327871]]
* 08:40 XioNoX: bump SGIX max prefix limit
* 08:13 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:883221{{!}}Add sandbox link to Serbo-Croatian Wikipedia (T327833)]] (duration: 10m 13s)
* 08:05 ladsgroup@deploy1002: ladsgroup and aleksandar: Backport for [[gerrit:883221{{!}}Add sandbox link to Serbo-Croatian Wikipedia (T327833)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:03 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:883221{{!}}Add sandbox link to Serbo-Croatian Wikipedia (T327833)]]
* 07:49 marostegui: Cloning db1196 from db1206 (lag will appear on s1 wiki replicas) [[phab:T327859|T327859]]
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 to clone db1196 [[phab:T327859|T327859]]', diff saved to https://phabricator.wikimedia.org/P43322 and previous config saved to /var/cache/conftool/dbconfig/20230125-074601-marostegui.json
* 07:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfff15d]: (no justification provided) (duration: 00m 05s)
* 07:34 phedenskog@deploy1002: Started deploy [performance/navtiming@bfff15d]: (no justification provided)
* 07:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
* 07:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1198', diff saved to https://phabricator.wikimedia.org/P43320 and previous config saved to /var/cache/conftool/dbconfig/20230125-072033-marostegui.json
* 07:08 AndyRussG: updated payments (config only) revision {{Gerrit|15395d05}}, config {{Gerrit|418160e9}}
* 04:10 eileen: config revision changed from {{Gerrit|dc0a0d3a}} to {{Gerrit|089d0acb}}
* 04:01 eileen: civicrm upgraded from {{Gerrit|9197ca29}} to {{Gerrit|3e6b21b6}}
* 03:27 eileen: civicrm upgraded from {{Gerrit|f6093fb2}} to {{Gerrit|9197ca29}}
* 03:05 eileen: config revision changed from {{Gerrit|3f641fce}} to {{Gerrit|dc0a0d3a}}
* 01:17 legoktm: adjusting Gerrit group "Campaigns Team" so it is not recursively a member of itself
* 00:10 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
* 00:10 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye


== 2020-05-04 ==
== 2023-01-24 ==
* 23:38 mstyles@deploy1001: Finished deploy [wdqs/wdqs@6518a8d]: v.0.3.26 (duration: 14m 39s)
* 23:10 zabe@deploy1002: Finished scap: Backport for [[gerrit:883281{{!}}Start reading from rev_comment_id on testcommonswiki (T299954)]] (duration: 08m 02s)
* 23:37 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Use namespaced EventBus classes (duration: 00m 57s)
* 23:04 zabe@deploy1002: zabe: Backport for [[gerrit:883281{{!}}Start reading from rev_comment_id on testcommonswiki (T299954)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 23:35 reedy@deploy1001: Synchronized wmf-config/logging.php: Use namespaced EventBus classes (duration: 00m 56s)
* 23:02 zabe@deploy1002: Started scap: Backport for [[gerrit:883281{{!}}Start reading from rev_comment_id on testcommonswiki (T299954)]]
* 23:33 reedy@deploy1001: Synchronized rpc/RunSingleJob.php: Use namespaced EventBus classes (duration: 00m 58s)
* 22:47 TheresNoTime: closing UTC late backport window
* 23:29 reedy@deploy1001: Synchronized wmf-config/logging.php: Replace AuthManagerStatsdHandler with WikimediaEventsAuthManagerStatsdHandler::class (duration: 00m 57s)
* 22:47 samtar@deploy1002: Finished scap: Backport for [[gerrit:883212{{!}}Add temporary extra grid-area for content translation extension (T327715)]], [[gerrit:883217{{!}}Add temporary extra grid-area for content translation extension (T327715)]] (duration: 09m 04s)
* 23:23 mstyles@deploy1001: Started deploy [wdqs/wdqs@6518a8d]: v.0.3.26
* 22:39 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:883212{{!}}Add temporary extra grid-area for content translation extension (T327715)]], [[gerrit:883217{{!}}Add temporary extra grid-area for content translation extension (T327715)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:42 sbassett@deploy1001: Synchronized private/PrivateSettings.php: [[phab:T251835|T251835]]: Restore {{Gerrit|dc752af1e94684faacbe9662789815c6edbbdf46}} (duration: 00m 57s)
* 22:37 samtar@deploy1002: Started scap: Backport for [[gerrit:883212{{!}}Add temporary extra grid-area for content translation extension (T327715)]], [[gerrit:883217{{!}}Add temporary extra grid-area for content translation extension (T327715)]]
* 22:16 eileen: process-control config revision is {{Gerrit|2eb75f8dff}}
* 22:30 samtar@deploy1002: Finished scap: Backport for [[gerrit:883282{{!}}[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], [[gerrit:883285{{!}}newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)]] (duration: 07m 59s)
* 22:06 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 22:23 samtar@deploy1002: jforrester and samtar and stang: Backport for [[gerrit:883282{{!}}[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], [[gerrit:883285{{!}}newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:45 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Revert partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 22:22 samtar@deploy1002: Started scap: Backport for [[gerrit:883282{{!}}[BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724)]], [[gerrit:883285{{!}}newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)]]
* 21:41 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deploy partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 22:20 samtar@deploy1002: Finished scap: Backport for [[gerrit:882681{{!}}newiki: Add new permissions to group reviewer (T327114)]] (duration: 09m 02s)
* 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - [[phab:T249822|T249822]], [[phab:T238086|T238086]] (duration: 00m 05s)
* 22:19 mutante: DNS - adding new project language "gur" (Gurenɛ) - Gurenɛ is a major language of northern Ghana and the predominant language of the Upper East Region of Ghana. It is also widely spoken in Burkina Faso.. [[phab:T327813|T327813]]
* 18:19 dpifke@deploy1001: Started deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - [[phab:T249822|T249822]], [[phab:T238086|T238086]]
* 22:13 samtar@deploy1002: samtar and stang: Backport for [[gerrit:882681{{!}}newiki: Add new permissions to group reviewer (T327114)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 18:16 Urbanecm: Morning SWAT done
* 22:11 samtar@deploy1002: Started scap: Backport for [[gerrit:882681{{!}}newiki: Add new permissions to group reviewer (T327114)]]
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c04fbdd}}: Adding upload_by_url user right to all registered users on Commons ([[phab:T251474|T251474]]) (duration: 00m 57s)
* 22:08 samtar@deploy1002: Finished scap: Backport for [[gerrit:883213{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]], [[gerrit:883216{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]] (duration: 09m 36s)
* 18:11 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/DiscussionTools/includes/DiscussionToolsHooks.php: SWAT: {{Gerrit|b85fc16}}: Enable on all ExtraSignaturesNamespaces ([[phab:T249036|T249036]]) (duration: 01m 00s)
* 22:06 TheresNoTime: extending UTC late backport window due to late start
* 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|18c1efb}}: Load DiscussionTools on en.wiki ([[phab:T249376|T249376]]) (duration: 00m 58s)
* 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=ats-be
* 17:57 XioNoX: configure singtel interface on cr1-eqsin
* 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=cdn
* 17:36 volans: upgraded spicerack on cumin[12]001 to 0.0.33-1
* 22:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS bullseye
* 17:02 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [{{Gerrit|2252f9a}}] (duration: 00m 09s)
* 22:00 samtar@deploy1002: samtar and jdrewniak: Backport for [[gerrit:883213{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]], [[gerrit:883216{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 17:02 joal@deploy1001: Started deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [{{Gerrit|2252f9a}}]
* 21:59 samtar@deploy1002: Started scap: Backport for [[gerrit:883213{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]], [[gerrit:883216{{!}}Fix Wikitext editor preview layout in Vector 2022 (T327778)]]
* 17:01 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [{{Gerrit|2252f9a}}] (duration: 16m 45s)
* 21:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:882727{{!}}Work around sticky-positioned layers disabling subpixel rendering (T327460)]] (duration: 13m 31s)
* 16:44 joal@deploy1001: Started deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [{{Gerrit|2252f9a}}]
* 21:45 samtar@deploy1002: nray and samtar: Backport for [[gerrit:882727{{!}}Work around sticky-positioned layers disabling subpixel rendering (T327460)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.30
* 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bullseye
* 15:59 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.30 (duration: 01m 05s)
* 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 15:58 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.30
* 21:43 samtar@deploy1002: Started scap: Backport for [[gerrit:882727{{!}}Work around sticky-positioned layers disabling subpixel rendering (T327460)]]
* 15:53 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 21:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 15:53 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
* 21:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
* 15:53 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 21:38 zabe: running migrateRevisionCommentTemp.php on testcommonswiki (s4) with --sleep 10 # [[phab:T275246|T275246]]
* 15:52 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 21:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
* 15:52 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 21:32 samtar@deploy1002: backport aborted: (duration: 06m 28s)
* 15:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2025 after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11128 and previous config saved to /var/cache/conftool/dbconfig/20200504-154747-kormat.json
* 21:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
* 15:45 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/includes/libs/rdbms/database/DatabaseMysqlBase.php: [[phab:T251457|T251457]] rdbms: don't treat lock() as a write operation (duration: 01m 04s)
* 21:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
* 15:43 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/resources/src/mediawiki.diff.styles/diff.less: [[phab:T250393|T250393]] Follow-up {{Gerrit|I07dd6f7}}: Fix font size in diff (duration: 01m 05s)
* 21:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS bullseye
* 15:34 volans: uploaded spicerack_0.0.33-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 21:05 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 15:26 volans: deploy1001: deleted old .hhvm.hhbc files (/home/*/.hhvm.hhbc) https://phabricator.wikimedia.org/P11127
* 21:03 TheresNoTime: holding UTC late backport window for outage, [[phab:T327815|T327815]]
* 15:23 volans: deploy1001: deleted old .hhvm.hhbc files moved from tin (/home/*/home-tin/.hhvm.hhbc) https://phabricator.wikimedia.org/P11126
* 21:01 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
* 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 fully after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11125 and previous config saved to /var/cache/conftool/dbconfig/20200504-151243-kormat.json
* 20:50 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 15:11 ppchelko@deploy1001: Finished deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints (duration: 14m 36s)
* 20:50 urandom: rebooting sessionstore1001.eqiad.wmnet -- [[phab:T325132|T325132]]
* 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [{{Gerrit|3396279}}] (duration: 00m 10s)
* 20:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host sessionstore1001.eqiad.wmnet
* 15:05 joal@deploy1001: Started deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [{{Gerrit|3396279}}]
* 20:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [{{Gerrit|3396279}}] (duration: 15m 07s)
* 20:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
* 15:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
* 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
* 14:57 ppchelko@deploy1001: Started deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints
* 20:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
* 14:50 joal@deploy1001: Started deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [{{Gerrit|3396279}}]
* 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=cdn
* 14:19 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 fully and db1101:3318 to 75% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11123 and previous config saved to /var/cache/conftool/dbconfig/20200504-141919-kormat.json
* 20:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet
* 14:15 XioNoX: add static nat for fran1001 - [[phab:T251763|T251763]]
* 20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5025.eqsin.wmnet with OS bullseye
* 13:50 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2025 for reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11122 and previous config saved to /var/cache/conftool/dbconfig/20200504-135039-kormat.json
* 20:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet
* 13:34 kormat: reimaging es2025 to buster [[phab:T250666|T250666]]
* 20:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
* 13:27 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 some more after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11121 and previous config saved to /var/cache/conftool/dbconfig/20200504-132744-kormat.json
* 20:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
* 13:02 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T248664|T248664]] Stop setting legacy wmgWikibase(Repo/Client)Repositories for TEST wikis (duration: 01m 06s)
* 20:20 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=ats-be
* 12:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11120 and previous config saved to /var/cache/conftool/dbconfig/20200504-124659-kormat.json
* 20:19 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=cdn
* 12:10 marostegui: Temporary enable slow query log on db1099:3311 - [[phab:T206103|T206103]]
* 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=cdn
* 12:09 Amir1: EU SWAT is done
* 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=ats-be
* 11:53 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:592761{{!}}Increase wmgMemoryLimit from 660MB to 666MB]] (duration: 01m 06s)
* 20:16 bblack: pool cp5032
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 [[phab:T206103|T206103]] after removing tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11119 and previous config saved to /var/cache/conftool/dbconfig/20200504-114727-marostegui.json
* 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=ats-be
* 11:46 tgr@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594134{{!}}Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 06s)
* 20:16 mutante: contint2001 - restarted zuul
* 11:46 marostegui: Remove index tmp_2 from recentchanges on db1099:3311 [[phab:T206103|T206103]]
* 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=cdn
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 [[phab:T206103|T206103]] to remove tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11118 and previous config saved to /var/cache/conftool/dbconfig/20200504-114539-marostegui.json
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
* 11:43 tgr@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594137{{!}}Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 10s)
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=cdn
* 11:38 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=ats-be
* 11:30 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
* 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=cdn
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4d00236}}: Enable cross-project search on frwikibooks ([[phab:T251683|T251683]]) (duration: 01m 05s)
* 20:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
* 11:25 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/elwikiversity*.png ([[phab:T251050|T251050]])
* 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS bullseye
* 11:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|64556ba}}: Correct typo in Greek Wikiversity logo ([[phab:T248391|T248391]]) (duration: 01m 06s)
* 20:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
* 11:20 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/jvwiki*.png ([[phab:T251050|T251050]])
* 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=cdn
* 11:20 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|3b8c618}}: Update jvwiki logos ([[phab:T251050|T251050]]) (duration: 01m 05s)
* 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
* 11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|cc94ea7}}: Enable VisualEditor for more namespaces on vecwiki ([[phab:T250419|T250419]]) (duration: 01m 07s)
* 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=ats-be
* 10:49 arturo: update packages in buster-wikimedia {{!}} thirdparty/kubead-k8s-1-15 and thirdparty/kubeadm-k8s-1-16 ([[phab:T250866|T250866]])
* 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=cdn
* 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:594128{{!}} Bumping portals to master (563985)]] (duration: 01m 05s)
* 20:05 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
* 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:594128{{!}} Bumping portals to master (563985)]] (duration: 01m 29s)
* 20:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
* 10:39 vgutierrez: rolling upgrade of ATS to version 8.0.7-1wm3
* 19:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
* 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
* 10:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 19:54 sukhe: reprepro -C main include bullseye-wikimedia libvmod-netmapper_1.9-3_amd64.changes: [[phab:T326634|T326634]]
* 10:30 arturo: running `aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished` to cleanup buster-wikimedia{{!}}thirdparty/kubeadm-k8s ([[phab:T250866|T250866]])
* 19:53 sukhe: reprepro -C main include bullseye-wikimedia libvmod-re2_1.5.3-4_amd64.changes: [[phab:T326634|T326634]]
* 09:46 vgutierrez: upload trafficserver 8.0.7-1wm2 to apt.wm.o (buster)
* 19:53 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
* 09:22 kormat: reimaging db1101 to buster [[phab:T250666|T250666]]
* 19:51 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
* 08:50 XioNoX: configure BGP peering with AS132203
* 19:47 sukhe: reprepro -C main include bullseye-wikimedia libvmod-querysort_0.4_amd64.changes: [[phab:T326634|T326634]]
* 08:20 godog: add 50G to prometheus-ops on prometheus100[34]
* 19:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
* 08:17 marostegui: Deploy schema change on s5 codfw - [[phab:T251188|T251188]]
* 19:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for reimage', diff saved to https://phabricator.wikimedia.org/P11113 and previous config saved to /var/cache/conftool/dbconfig/20200504-075148-marostegui.json
* 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
* 07:31 marostegui: Drop unused flagged* tables from mediawikiwiki - [[phab:T248298|T248298]]
* 19:39 urandom: rebooting restbase cassandra nodes, row d -- [[phab:T325132|T325132]]
* 07:26 moritzm: removed jmorgan from cn=wmf
* 19:33 bblack: cp5032: restart varnish-frontend
* 07:24 marostegui: Install 10.1.43-2 on s5 (db110) and s6 (db1131) masters in preparations for tomorrow's restart - [[phab:T251154|T251154]]
* 19:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2025.codfw.wmnet
* 07:24 moritzm: removed Kerberos principal for lexnasser and jmorgan
* 19:28 sukhe: reprepro -C main include bullseye-wikimedia varnish-modules_0.15.0-3_amd64.changes: [[phab:T326634|T326634]]
* 07:23 moritzm: removed lexnasser from cn=nda
* 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
* 07:07 elukey: execute ifdown eno1; ifup eno1 on analytics1052 - interface neg speed flapping
* 19:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
* 06:41 elukey: upload prometheus-druid-exporter 0.8-1 to stretch-wikimedia
* 19:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2025.codfw.wmnet
* 19:19 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
* 19:19 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5025.eqsin.wmnet with OS bullseye
* 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 19:06 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bullseye
* 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bullseye
* 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 18:55 jynus: deploy new dump grants for analytics dbs at db1108 [[phab:T327155|T327155]]
* 18:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
* 18:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS bullseye
* 18:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
* 18:14 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
* 18:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
* 18:05 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
* 17:44 bblack: cp5032: upgrading packages (varnish, trafficserver
* 17:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2020.codfw.wmnet
* 17:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
* 17:36 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
* 17:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
* 17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
* 17:19 thcipriani: restarting ci jenkins for updates
* 17:13 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
* 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
* 17:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
* 17:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
* 17:04 urandom: rebooting restbase cassandra nodes, row c -- [[phab:T325132|T325132]]
* 16:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 16:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 16:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2042.codfw.wmnet with OS bullseye
* 16:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 16:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 16:23 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 16:23 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 16:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 16:22 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
* 16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 16:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
* 15:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:53 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2042.codfw.wmnet with OS bullseye
* 15:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:31 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 15:26 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 40s)
* 15:15 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
* 15:12 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring" (duration: 00m 33s)
* 15:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring"
* 14:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
* 14:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
* 14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
* 14:39 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 14:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
* 14:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:36 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:35 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:34 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 14:33 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:29 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 14:29 effie: switch maps (kartotherian) from eqiad to codfw (attempt #2)
* 14:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:25 TheresNoTime: close UTC afternoon backport window
* 14:24 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:20 XioNoX: repool ulsfo (maintenance over)
* 14:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bullseye
* 14:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:868127{{!}}Increase PC writes from parsoid API to 10% (T320534)]] (duration: 07m 41s)
* 14:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:11 samtar@deploy1002: daniel and samtar: Backport for [[gerrit:868127{{!}}Increase PC writes from parsoid API to 10% (T320534)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:09 samtar@deploy1002: Started scap: Backport for [[gerrit:868127{{!}}Increase PC writes from parsoid API to 10% (T320534)]]
* 13:50 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 13:44 XioNoX: reboot ulsfo switches for software upgrade
* 13:40 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1002.eqiad.wmnet
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1002.eqiad.wmnet
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2002.codfw.wmnet
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 13:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:10 topranks: enabling tunnel services on cr2-eqdfw fpc 0 pic 1
* 13:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:04 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2002.codfw.wmnet
* 12:56 zabe@deploy1002: Finished scap: Backport for [[gerrit:881468{{!}}Remove PoolCounter from extension-list (T327336)]] (duration: 44m 09s)
* 12:51 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 12:51 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
* 12:48 XioNoX: restart ulsfo switches for network maintenance
* 12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
* 12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
* 12:40 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
* 12:38 zabe@deploy1002: zabe: Backport for [[gerrit:881468{{!}}Remove PoolCounter from extension-list (T327336)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
* 12:12 zabe@deploy1002: Started scap: Backport for [[gerrit:881468{{!}}Remove PoolCounter from extension-list (T327336)]]
* 11:54 volans: uploaded python3-gjson_1.0.0 to apt.wikimedia.org bullseye-wikimedia,unstable-wikimedia
* 11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43311 and previous config saved to /var/cache/conftool/dbconfig/20230124-114255-root.json
* 11:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:36 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3002.esams.wmnet
* 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43310 and previous config saved to /var/cache/conftool/dbconfig/20230124-112750-root.json
* 11:26 zabe@deploy1002: Finished scap: Backport for [[gerrit:881467{{!}}Stop loading PoolCounter extension (T327336)]] (duration: 09m 19s)
* 11:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bullseye
* 11:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping3002.esams.wmnet
* 11:19 zabe@deploy1002: zabe: Backport for [[gerrit:881467{{!}}Stop loading PoolCounter extension (T327336)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:17 zabe@deploy1002: Started scap: Backport for [[gerrit:881467{{!}}Stop loading PoolCounter extension (T327336)]]
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43308 and previous config saved to /var/cache/conftool/dbconfig/20230124-111245-root.json
* 11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
* 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
* 11:03 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 11:03 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 11:03 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 11:02 effie: depooling maps (kartotherian) from codfw, leaving eqiad as pooled
* 11:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:59 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
* 10:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:58 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 10:58 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43306 and previous config saved to /var/cache/conftool/dbconfig/20230124-105740-root.json
* 10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bullseye
* 10:52 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 10:49 XioNoX: depool ulsfo for network maintenance - [[phab:T316532|T316532]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl in s1 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43305 and previous config saved to /var/cache/conftool/dbconfig/20230124-104336-marostegui.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43304 and previous config saved to /var/cache/conftool/dbconfig/20230124-104235-root.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1176 from s1 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43303 and previous config saved to /var/cache/conftool/dbconfig/20230124-104219-root.json
* 10:33 vgutierrez: repool cp4046
* 10:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:31 vgutierrez: restarting varnish on cp4046
* 10:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:29 vgutierrez: depool cp4046
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43302 and previous config saved to /var/cache/conftool/dbconfig/20230124-102730-root.json
* 10:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:22 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 10:19 moritzm: rolling Apache/FPM restarts on mw canaries to pick up libtasn security update
* 10:19 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2165 [[phab:T327754|T327754]]', diff saved to https://phabricator.wikimedia.org/P43301 and previous config saved to /var/cache/conftool/dbconfig/20230124-101825-root.json
* 10:17 effie: depooling maps from equad && pooling maps on codfw
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T327754|T327754]]', diff saved to https://phabricator.wikimedia.org/P43300 and previous config saved to /var/cache/conftool/dbconfig/20230124-101727-root.json
* 10:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:14 marostegui: Starting s8 codfw failover from db2165 to db2161 - [[phab:T327754|T327754]]
* 10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2041.codfw.wmnet with OS bullseye
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43299 and previous config saved to /var/cache/conftool/dbconfig/20230124-101025-root.json
* 09:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
* 09:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
* 09:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43298 and previous config saved to /var/cache/conftool/dbconfig/20230124-095520-root.json
* 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s8 [[phab:T327754|T327754]]
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T327754|T327754]]', diff saved to https://phabricator.wikimedia.org/P43297 and previous config saved to /var/cache/conftool/dbconfig/20230124-095235-marostegui.json
* 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s8 [[phab:T327754|T327754]]
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43296 and previous config saved to /var/cache/conftool/dbconfig/20230124-094725-root.json
* 09:41 moritzm: installing libtasn1-6 security updates on buster
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43295 and previous config saved to /var/cache/conftool/dbconfig/20230124-094016-root.json
* 09:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 09:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2041.codfw.wmnet with OS bullseye
* 09:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43294 and previous config saved to /var/cache/conftool/dbconfig/20230124-093220-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43293 and previous config saved to /var/cache/conftool/dbconfig/20230124-092511-root.json
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43292 and previous config saved to /var/cache/conftool/dbconfig/20230124-091715-root.json
* 09:14 kart_: Done: UTC morning backport window
* 09:13 kartik@deploy1002: Finished scap: Backport for [[gerrit:878853{{!}}Remove Kartographer versioned mapdata flags (T326288)]] (duration: 09m 44s)
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43291 and previous config saved to /var/cache/conftool/dbconfig/20230124-091006-root.json
* 09:05 kartik@deploy1002: awight and kartik: Backport for [[gerrit:878853{{!}}Remove Kartographer versioned mapdata flags (T326288)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:03 kartik@deploy1002: Started scap: Backport for [[gerrit:878853{{!}}Remove Kartographer versioned mapdata flags (T326288)]]
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43290 and previous config saved to /var/cache/conftool/dbconfig/20230124-090210-root.json
* 09:01 kartik@deploy1002: Finished scap: Backport for [[gerrit:875463{{!}}Deprecate the EnableMapFrame feature flag (T326288)]] (duration: 10m 42s)
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43289 and previous config saved to /var/cache/conftool/dbconfig/20230124-085501-root.json
* 08:52 kartik@deploy1002: awight and kartik: Backport for [[gerrit:875463{{!}}Deprecate the EnableMapFrame feature flag (T326288)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:50 kartik@deploy1002: Started scap: Backport for [[gerrit:875463{{!}}Deprecate the EnableMapFrame feature flag (T326288)]]
* 08:48 kartik@deploy1002: Finished scap: Backport for [[gerrit:882240{{!}}Enable write new for CheckUserLog comment fields on testwikis (T233004)]] (duration: 15m 20s)
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43288 and previous config saved to /var/cache/conftool/dbconfig/20230124-084705-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db2115 in x1 codfw', diff saved to https://phabricator.wikimedia.org/P43287 and previous config saved to /var/cache/conftool/dbconfig/20230124-084552-marostegui.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 [[phab:T327745|T327745]]', diff saved to https://phabricator.wikimedia.org/P43286 and previous config saved to /var/cache/conftool/dbconfig/20230124-084508-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw [[phab:T327745|T327745]]', diff saved to https://phabricator.wikimedia.org/P43285 and previous config saved to /var/cache/conftool/dbconfig/20230124-084206-marostegui.json
* 08:39 marostegui: Starting x1 codfw failover from db2096 to db2115 - [[phab:T327745|T327745]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 [[phab:T327745|T327745]]', diff saved to https://phabricator.wikimedia.org/P43284 and previous config saved to /var/cache/conftool/dbconfig/20230124-083643-marostegui.json
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T327745|T327745]]
* 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T327745|T327745]]
* 08:35 kartik@deploy1002: dreamyjazz and kartik: Backport for [[gerrit:882240{{!}}Enable write new for CheckUserLog comment fields on testwikis (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 08:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@8c87ca6]: (no justification provided) (duration: 00m 06s)
* 08:34 phedenskog@deploy1002: Started deploy [performance/navtiming@8c87ca6]: (no justification provided)
* 08:33 kartik@deploy1002: Started scap: Backport for [[gerrit:882240{{!}}Enable write new for CheckUserLog comment fields on testwikis (T233004)]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43283 and previous config saved to /var/cache/conftool/dbconfig/20230124-083200-root.json
* 08:28 kartik@deploy1002: Finished scap: Backport for [[gerrit:883098{{!}}Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)]] (duration: 09m 09s)
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2110 from API [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43282 and previous config saved to /var/cache/conftool/dbconfig/20230124-082440-marostegui.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43281 and previous config saved to /var/cache/conftool/dbconfig/20230124-082138-marostegui.json
* 08:21 kartik@deploy1002: kartik and matmarex: Backport for [[gerrit:883098{{!}}Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43280 and previous config saved to /var/cache/conftool/dbconfig/20230124-082025-root.json
* 08:19 kartik@deploy1002: Started scap: Backport for [[gerrit:883098{{!}}Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)]]
* 08:18 marostegui: Starting s4 codfw failover from db2140 to db2110 - [[phab:T327739|T327739]]
* 08:16 kartik@deploy1002: Finished scap: Backport for [[gerrit:882266{{!}}Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)]] (duration: 10m 25s)
* 08:07 kartik@deploy1002: kartik: Backport for [[gerrit:882266{{!}}Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 08:05 kartik@deploy1002: Started scap: Backport for [[gerrit:882266{{!}}Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)]]
* 07:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T327739|T327739]]
* 07:58 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 [[phab:T327739|T327739]]
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 [[phab:T327739|T327739]]', diff saved to https://phabricator.wikimedia.org/P43279 and previous config saved to /var/cache/conftool/dbconfig/20230124-075824-root.json
* 07:50 moritzm: installing Linux 5.10.162 on Bullseye hosts
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl [[phab:T327616|T327616]]', diff saved to https://phabricator.wikimedia.org/P43278 and previous config saved to /var/cache/conftool/dbconfig/20230124-074323-marostegui.json
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43277 and previous config saved to /var/cache/conftool/dbconfig/20230124-064905-ladsgroup.json
* 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43276 and previous config saved to /var/cache/conftool/dbconfig/20230124-064554-ladsgroup.json
* 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43275 and previous config saved to /var/cache/conftool/dbconfig/20230124-063358-ladsgroup.json
* 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43274 and previous config saved to /var/cache/conftool/dbconfig/20230124-063048-ladsgroup.json
* 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43273 and previous config saved to /var/cache/conftool/dbconfig/20230124-061852-ladsgroup.json
* 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43272 and previous config saved to /var/cache/conftool/dbconfig/20230124-061541-ladsgroup.json
* 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43271 and previous config saved to /var/cache/conftool/dbconfig/20230124-060345-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43270 and previous config saved to /var/cache/conftool/dbconfig/20230124-060129-ladsgroup.json
* 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43269 and previous config saved to /var/cache/conftool/dbconfig/20230124-060035-ladsgroup.json
* 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P43268 and previous config saved to /var/cache/conftool/dbconfig/20230124-055816-ladsgroup.json
* 05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:57 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.18 (duration: 02m 07s)
* 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]] (duration: 53m 01s)
* 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.20  refs [[phab:T325583|T325583]]
* 03:30 AndyRussG: payments-wiki upgraded from {{Gerrit|3d882ac7}} to {{Gerrit|15395d05}}
* 02:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2024.codfw.wmnet
* 02:27 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2024.codfw.wmnet
* 02:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
* 02:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
* 02:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2019.codfw.wmnet
* 02:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
* 02:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
* 01:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
* 01:51 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
* 01:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
* 01:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
* 01:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
* 01:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1032.eqiad.wmnet
* 01:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
* 01:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1031.eqiad.wmnet
* 01:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1031.eqiad.wmnet
* 01:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
* 00:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
* 00:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
* 00:47 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
* 00:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
* 00:38 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
* 00:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
* 00:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
* 00:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
* 00:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
* 00:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:881466{{!}}Use core's PoolCounterClient (T327336)]] (duration: 12m 47s)
* 00:03 zabe@deploy1002: zabe: Backport for [[gerrit:881466{{!}}Use core's PoolCounterClient (T327336)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 00:01 zabe@deploy1002: Started scap: Backport for [[gerrit:881466{{!}}Use core's PoolCounterClient (T327336)]]


== 2020-05-03 ==
== 2023-01-23 ==
* 22:52 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. https://gerrit.wikimedia.org/r/593929
* 23:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
* 22:42 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. https://gerrit.wikimedia.org/r/591459
* 23:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
* 21:37 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service (duration: 04m 22s)
* 23:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
* 21:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service
* 23:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
* 23:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
* 23:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
* 22:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
* 22:57 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 22:56 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@544f5f3]: 0.3.119 (duration: 07m 30s)
* 22:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
* 22:49 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.119` on canary `wdqs1003`; proceeding to rest of fleet
* 22:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@544f5f3]: 0.3.119
* 22:46 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.119`. Pre-deploy tests passing on canary `wdqs1003`
* 22:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
* 22:37 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
* 22:31 maryum: Deployed patch for [[phab:T285159|T285159]]
* 21:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
* 21:40 zabe@deploy1002: Finished scap: Backport for [[gerrit:882746{{!}}throttle: Remove expired rule]] (duration: 07m 45s)
* 21:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
* 21:34 zabe@deploy1002: zabe: Backport for [[gerrit:882746{{!}}throttle: Remove expired rule]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:32 zabe@deploy1002: Started scap: Backport for [[gerrit:882746{{!}}throttle: Remove expired rule]]
* 21:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
* 21:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
* 21:12 kindrobot: close UTC late backport window
* 21:12 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:882715{{!}}Enable Page Tools for logged-in users on enwiki (T327686)]] (duration: 09m 00s)
* 21:04 kindrobot@deploy1002: jdrewniak and kindrobot: Backport for [[gerrit:882715{{!}}Enable Page Tools for logged-in users on enwiki (T327686)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 21:03 kindrobot@deploy1002: Started scap: Backport for [[gerrit:882715{{!}}Enable Page Tools for logged-in users on enwiki (T327686)]]
* 21:01 kindrobot: start UTC late backport window
* 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 20:45 taavi: restart [[phab:T315510|T315510]] on group1 after mwmaint restart, currently running on wikidatawiki
* 19:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
* 19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
* 19:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
* 19:30 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
* 19:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
* 19:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
* 19:17 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
* 19:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
* 19:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
* 19:16 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
* 18:48 mutante: miscweb1002 - unload CAS apache module and config; apt-get remove libapache2-mod-auth-cas
* 18:19 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf  - unlink /etc/apache2/mods-enabled/auth_cas.load - apt-get remove libapache2-mod-auth-cas - [[phab:T327405|T327405]]
* 18:08 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf  - unlink /etc/apache2/mods-enabled/auth_cas.load
* 18:05 mutante: miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
* 18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
* 18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
* 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
* 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
* 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
* 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
* 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 16:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 16:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 16:48 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:882682{{!}} Bumping portals to master (T128546)]] (duration: 06m 48s)
* 16:42 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:882682{{!}} Bumping portals to master (T128546)]] (duration: 06m 48s)
* 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
* 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
* 15:53 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: [[phab:T326634|T326634]]
* 15:50 urbanecm: Deploy security patch for [[phab:T327613|T327613]]
* 15:48 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 15:48 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
* 15:44 papaul: on going maintenance on fasw-codfw
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
* 15:17 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: [[phab:T325563|T325563]]
* 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
* 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
* 15:09 taavi@deploy1002: Finished scap: Backport for [[gerrit:882661{{!}}Revert "Enable Linter write namespace tag and template using core config"]] (duration: 07m 28s)
* 15:03 taavi@deploy1002: taavi and trainbranchbot: Backport for [[gerrit:882661{{!}}Revert "Enable Linter write namespace tag and template using core config"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 15:02 taavi@deploy1002: Started scap: Backport for [[gerrit:882661{{!}}Revert "Enable Linter write namespace tag and template using core config"]]
* 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P43248 and previous config saved to /var/cache/conftool/dbconfig/20230123-150110-marostegui.json
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P43247 and previous config saved to /var/cache/conftool/dbconfig/20230123-150018-marostegui.json
* 15:00 taavi@deploy1002: Finished scap: Backport for [[gerrit:880989{{!}}Enable Linter write namespace tag and template using core config (T299612)]] (duration: 07m 56s)
* 14:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 14:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 14:53 taavi@deploy1002: taavi and sbailey: Backport for [[gerrit:880989{{!}}Enable Linter write namespace tag and template using core config (T299612)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:52 taavi@deploy1002: Started scap: Backport for [[gerrit:880989{{!}}Enable Linter write namespace tag and template using core config (T299612)]]
* 14:46 taavi@deploy1002: Finished scap: Backport for [[gerrit:882179{{!}}SpecialUserrights: Allow updating the expiry of user groups (T327605)]] (duration: 08m 48s)
* 14:42 sukhe: rolling out pybal 1.15.10: [[phab:T321191|T321191]]
* 14:39 taavi@deploy1002: taavi and func: Backport for [[gerrit:882179{{!}}SpecialUserrights: Allow updating the expiry of user groups (T327605)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:37 taavi@deploy1002: Started scap: Backport for [[gerrit:882179{{!}}SpecialUserrights: Allow updating the expiry of user groups (T327605)]]
* 14:37 taavi@deploy1002: Finished scap: Backport for [[gerrit:876196{{!}}zhwiki: Install PageAssessments (T326387)]] (duration: 11m 24s)
* 14:27 taavi@deploy1002: stang and taavi: Backport for [[gerrit:876196{{!}}zhwiki: Install PageAssessments (T326387)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 14:25 taavi@deploy1002: Started scap: Backport for [[gerrit:876196{{!}}zhwiki: Install PageAssessments (T326387)]]
* 14:25 taavi@deploy1002: Finished scap: Backport for [[gerrit:882422{{!}}bnwikiquote: Update logo (T323131)]], [[gerrit:882425{{!}}shnwikibooks: Add project logo (T327380)]] (duration: 09m 22s)
* 14:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 14:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:18 taavi: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki pageassessments # [[phab:T326387|T326387]]
* 14:17 taavi@deploy1002: taavi and stang: Backport for [[gerrit:882422{{!}}bnwikiquote: Update logo (T323131)]], [[gerrit:882425{{!}}shnwikibooks: Add project logo (T327380)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:16 taavi@deploy1002: Started scap: Backport for [[gerrit:882422{{!}}bnwikiquote: Update logo (T323131)]], [[gerrit:882425{{!}}shnwikibooks: Add project logo (T327380)]]
* 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
* 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
* 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 12:06 marostegui: dbmaint Reboot db2135 (m5 codfw master)
* 12:06 marostegui: dbmaint Reboot db2134 (m3 codfw master)
* 12:05 Emperor: removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
* 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
* 11:58 marostegui: dbmaint Reboot db2133 (m2 codfw master)
* 11:57 marostegui: dbmaint Reboot db2132 (m1 codfw master)
* 11:57 marostegui: Reboot db2132 (m1 codfw master)
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43239 and previous config saved to /var/cache/conftool/dbconfig/20230123-113506-ladsgroup.json
* 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2114 [[phab:T327644|T327644]]', diff saved to https://phabricator.wikimedia.org/P43236 and previous config saved to /var/cache/conftool/dbconfig/20230123-112134-ladsgroup.json
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43235 and previous config saved to /var/cache/conftool/dbconfig/20230123-112001-ladsgroup.json
* 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2129 to s6 primary [[phab:T327644|T327644]]', diff saved to https://phabricator.wikimedia.org/P43234 and previous config saved to /var/cache/conftool/dbconfig/20230123-111813-ladsgroup.json
* 11:17 Amir1: Starting s6 codfw failover from db2114 to db2129 - [[phab:T327644|T327644]]
* 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 ([[phab:T323827|T323827]])', diff saved to https://phabricator.wikimedia.org/P43233 and previous config saved to /var/cache/conftool/dbconfig/20230123-111147-ladsgroup.json
* 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43232 and previous config saved to /var/cache/conftool/dbconfig/20230123-110456-ladsgroup.json
* 10:55 XioNoX: update management routers ACLs to add new bast hosts
* 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2129 with weight 0 [[phab:T327644|T327644]]', diff saved to https://phabricator.wikimedia.org/P43231 and previous config saved to /var/cache/conftool/dbconfig/20230123-105520-ladsgroup.json
* 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T327644|T327644]]
* 10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T327644|T327644]]
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43230 and previous config saved to /var/cache/conftool/dbconfig/20230123-104951-ladsgroup.json
* 10:48 vgutierrez: rolling upgrade to HAProxy 2.4.20 on ulsfo
* 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 06s)
* 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
* 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 20s)
* 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
* 10:39 btullis@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
* 10:39 btullis@deploy1002: Installing scap version "4.33.1" for 1 hosts
* 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1010.eqiad.wmnet with OS bullseye
* 10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
* 10:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
* 10:07 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:877244{{!}}Remove Flow as default in techconductwiki]] (duration: 07m 51s)
* 10:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
* 10:01 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:877244{{!}}Remove Flow as default in techconductwiki]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:59 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:877244{{!}}Remove Flow as default in techconductwiki]]
* 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 08:49 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:49 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
* 08:48 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
* 08:46 volans@cumin1001: START - Cookbook sre.dns.netbox
* 08:45 zabe@deploy1002: Finished scap: Backport for [[gerrit:882217{{!}}Remove oversight group from privileged groups (T112147)]], [[gerrit:882577{{!}}Start reading from cuc_comment_id on wikidatawiki (T233004)]] (duration: 07m 48s)
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group [[phab:T326669|T326669]]', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group [[phab:T326669|T326669]]', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
* 08:39 zabe@deploy1002: zabe: Backport for [[gerrit:882217{{!}}Remove oversight group from privileged groups (T112147)]], [[gerrit:882577{{!}}Start reading from cuc_comment_id on wikidatawiki (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:37 zabe@deploy1002: Started scap: Backport for [[gerrit:882217{{!}}Remove oversight group from privileged groups (T112147)]], [[gerrit:882577{{!}}Start reading from cuc_comment_id on wikidatawiki (T233004)]]
* 08:37 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 08s)
* 08:36 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
* 08:30 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:882174{{!}}Tweaks for new heading HTML structure (T327328 T327469)]] (duration: 17m 12s)
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
* 08:22 ladsgroup@deploy1002: ladsgroup and matmarex: Backport for [[gerrit:882174{{!}}Tweaks for new heading HTML structure (T327328 T327469)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:12 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:882174{{!}}Tweaks for new heading HTML structure (T327328 T327469)]]
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43223 and previous config saved to /var/cache/conftool/dbconfig/20230123-081045-root.json
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43222 and previous config saved to /var/cache/conftool/dbconfig/20230123-081035-root.json
* 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43221 and previous config saved to /var/cache/conftool/dbconfig/20230123-080824-ladsgroup.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43220 and previous config saved to /var/cache/conftool/dbconfig/20230123-075540-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43219 and previous config saved to /var/cache/conftool/dbconfig/20230123-075530-root.json
* 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43218 and previous config saved to /var/cache/conftool/dbconfig/20230123-075319-ladsgroup.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43217 and previous config saved to /var/cache/conftool/dbconfig/20230123-074035-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43216 and previous config saved to /var/cache/conftool/dbconfig/20230123-074025-root.json
* 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43215 and previous config saved to /var/cache/conftool/dbconfig/20230123-073814-ladsgroup.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43214 and previous config saved to /var/cache/conftool/dbconfig/20230123-072530-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43213 and previous config saved to /var/cache/conftool/dbconfig/20230123-072520-root.json
* 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43212 and previous config saved to /var/cache/conftool/dbconfig/20230123-072309-ladsgroup.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 db1206 [[phab:T326669|T326669]]', diff saved to https://phabricator.wikimedia.org/P43211 and previous config saved to /var/cache/conftool/dbconfig/20230123-071323-marostegui.json
* 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 06:23 kart_: Updated cxserver to 2023-01-20-051603-production ([[phab:T323840|T323840]], [[phab:T326236|T326236]])
* 06:19 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 06:18 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 06:16 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 06:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 06:12 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
* 04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2113 [[phab:T327611|T327611]]', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json
* 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2123 to s5 primary [[phab:T327611|T327611]]', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json
* 04:57 Amir1: Starting s5 codfw failover from db2113 to db2123 - [[phab:T327611|T327611]]
* 04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2123 with weight 0 [[phab:T327611|T327611]]', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json
* 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T327611|T327611]]
* 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T327611|T327611]]
* 04:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2107 [[phab:T327609|T327609]]', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json
* 03:52 Amir1: Starting s2 codfw failover from db2107 to db2104 - [[phab:T327609|T327609]]


== 2020-05-02 ==
== 2023-01-20 ==
* 07:49 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(49{{!}}5[0-9]{{!}}6[0-2])\.eqiad\.wmnet
* 18:22 jynus: deploying new grants for backups on m1 [[phab:T327155|T327155]]
* 07:08 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 0 member 1
* 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 02:36 volker-e@deploy1001: Finished deploy [design/style-guide@f0d467b]: Deploy design/style-guide: (duration: 00m 07s)
* 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 02:36 volker-e@deploy1001: Started deploy [design/style-guide@f0d467b]: Deploy design/style-guide:
* 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
* 13:08 moritzm: installing node-minimatch security updates
* 13:01 moritzm: installing libxstream-java security updates
* 13:00 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1~wmf1_amd64.changes: [[phab:T325557|T325557]]
* 12:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
* 12:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS bullseye
* 12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
* 12:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
* 12:17 moritzm: installing ping1003 [[phab:T273509|T273509]]
* 12:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS bullseye
* 12:03 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
* 12:02 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
* 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
* 10:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
* 10:32 elukey: restart kubelet on ml-staging200* nodes (some fs-inotify-related issues with the istio-proxy of newly created containers)
* 10:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:13 moritzm: installing emacs security updates on bullseye
* 10:13 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:12 moritzm: imported jenkins 2.375-2 to thirdparty/ci [[phab:T326531|T326531]]
* 10:00 jnuche@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
* 10:00 jnuche@deploy1002: Installing scap version "4.33.1" for 1 hosts
* 08:59 moritzm: installing ping2003 [[phab:T273509|T273509]]
* 08:10 elukey: restart kubelet on kubernetes2007 - node reported issues with it, marked as "notready" by the control plane
* 07:58 elukey: `apt-get clean` on doh4001 to free space (root partition almost filled)
* 01:55 ejegg: payments-wiki upgraded from {{Gerrit|3cf03933}} to {{Gerrit|3d882ac7}}
* 01:12 ejegg: payments-wiki upgraded from {{Gerrit|fcb9ab60}} to {{Gerrit|3cf03933}}


== 2020-05-01 ==
== 2023-01-19 ==
* 19:56 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 21:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2039.codfw.wmnet with OS bullseye
* 18:57 gehel: restart blazegraph on wdqs1006 - [[phab:T242453|T242453]]
* 21:42 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881677{{!}}Enable Page tools on viwiki and itwiki (T327348)]] (duration: 10m 38s)
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11110 and previous config saved to /var/cache/conftool/dbconfig/20200501-142354-marostegui.json
* 21:33 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for [[gerrit:881677{{!}}Enable Page tools on viwiki and itwiki (T327348)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:18 hknust: holger@mwmaint1002 finished renameInvalidUsernames.php (fail) as part of [[phab:T219279|T219279]]
* 21:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11109 and previous config saved to /var/cache/conftool/dbconfig/20200501-140603-marostegui.json
* 21:31 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881677{{!}}Enable Page tools on viwiki and itwiki (T327348)]]
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11108 and previous config saved to /var/cache/conftool/dbconfig/20200501-134707-marostegui.json
* 21:27 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881612{{!}}Fix grid blowout with limited width turned off (T327423)]] (duration: 08m 26s)
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly warm up db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11107 and previous config saved to /var/cache/conftool/dbconfig/20200501-132804-marostegui.json
* 21:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
* 13:06 hknust: holger@mwmaint1002 Starting renameInvalidUsernames.php as part of [[phab:T219279|T219279]]
* 21:20 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 13s)
* 13:01 vgutierrez: rolling restart of ats-tls in text@esams - [[phab:T249335|T249335]]
* 21:20 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
* 12:24 mutante: mw230* - rolling restart of php-fpm - icinga warnings about opcache health in codfw
* 21:20 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for [[gerrit:881612{{!}}Fix grid blowout with limited width turned off (T327423)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 12:20 mutante: mw2376 - restarting php-fpm - icinga warnings about opcache health in codfw
* 21:18 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881612{{!}}Fix grid blowout with limited width turned off (T327423)]]
* 12:07 mutante: notebook1004 - puppet was failed due to removal of jmorgan while one of his processes was still running. "change to absent failed.. user jmorgan currently used by process 29038". killing 29038, running puppet [[phab:T251560|T251560]]
* 21:11 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2039.codfw.wmnet with OS bullseye
* 12:05 mutante: notebook1003 - puppet was failed due to removal of jmorgan while one of his processeswas still running. "change to absent failed.. user jmorgan currently used by porcess 3288". killing 3288, running puppet [[phab:T251560|T251560]]
* 20:13 zabe@deploy1002: Finished scap: fix k8s drift (duration: 08m 02s)
* 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:05 zabe@deploy1002: Started scap: fix k8s drift
* 11:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:02 zabe@deploy1002: Finished scap: Backport for [[gerrit:881706{{!}}Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)]] (duration: 14m 01s)
* 11:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:49 zabe@deploy1002: zabe: Backport for [[gerrit:881706{{!}}Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 11:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:48 zabe@deploy1002: Started scap: Backport for [[gerrit:881706{{!}}Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)]]
* 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:36 zabe: re-start populateCucComment on wikidatawiki post-mwmaint-reboot in screen with --sleep 2, will take ~30 hours # [[phab:T233004|T233004]]
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:16 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 08:54 _joe_: depooled all servers in the app pool in rack D1
* 18:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 08:54 oblivian@cumin1001: conftool action : set/pooled=no:weight=30; selector: name=mw13(49{{!}}5[0-5])\.eqiad\.wmnet
* 18:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 08:50 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw13(49{{!}}5[0-5])\.eqiad\.wmnet
* 18:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 08:48 _joe_: repooling mw1407 with LCStoreStaticArray, increased opcache, puppet disabled
* 18:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 08:45 _joe_: repooling mw1409
* 18:08 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 08:39 _joe_: repool mw1352
* 18:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 08:37 _joe_: depooling mw1352
* 18:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 07:44 marostegui: Copy wikireplica dump from labsdb1009 to labsdb1011 - [[phab:T249188|T249188]]
* 18:02 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 01:36 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service (duration: 04m 33s)
* 18:01 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 01:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service
* 17:36 Amir1: bash Krinkle> Vatican Interm Papacy Runbook, § 5.1: Notify Wikipedia about incoming traffic.
* 17:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2038.codfw.wmnet with OS bullseye
* 17:13 zabe@deploy1002: Finished scap: [[phab:T233004|T233004]] (duration: 18m 50s)
* 17:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
* 16:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
* 16:54 zabe@deploy1002: Started scap: [[phab:T233004|T233004]]
* 16:54 zabe@deploy1002: backport aborted:  (duration: 15m 22s)
* 16:48 godog: roll-restart opensearch-dashboards in logstash collectors eqiad - [[phab:T327161|T327161]]
* 16:44 zabe@deploy1002: Started scap: Backport for [[gerrit:881609{{!}}Add ability to start from cuc_id to populateCucComment (T233004)]]
* 16:42 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2038.codfw.wmnet with OS bullseye
* 16:27 moritzm: installing cryptsetup updates for bullseye
* 16:18 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=1) rolling restart_daemons on A:logstash-collector
* 16:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
* 16:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
* 16:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
* 16:06 jclark@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:55 sukhe: update pybal to 1.15.10 on lvs4010: [[phab:T321191|T321191]]
* 15:45 effie: enable puppet on C:memcached hosts
* 15:42 godog: bounce opensearch on logstash102[34] - [[phab:T327161|T327161]]
* 15:30 sukhe: reprepro -C main include buster-wikimedia pybal_1.15.10_amd64.changes: [[phab:T321191|T321191]]
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43194 and previous config saved to /var/cache/conftool/dbconfig/20230119-151917-ladsgroup.json
* 15:17 effie: disable puppet on all C:memcached servers to deploy 812173
* 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43193 and previous config saved to /var/cache/conftool/dbconfig/20230119-150412-ladsgroup.json
* 14:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43192 and previous config saved to /var/cache/conftool/dbconfig/20230119-144907-ladsgroup.json
* 14:47 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:40 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43191 and previous config saved to /var/cache/conftool/dbconfig/20230119-143402-ladsgroup.json
* 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 14:32 zabe: run populateCulComment on group2 wikis # [[phab:T327290|T327290]]
* 14:30 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 13:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 12:27 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps2009.codfw.wmnet
* 12:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps2009.codfw.wmnet
* 12:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 12:06 moritzm: stopping/masking slapd on ldap-corp1001/ldap-corp2001 [[phab:T323820|T323820]]
* 11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1054.eqiad.wmnet with OS bullseye
* 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 11:29 hnowlan: rebooting maps-codfw for updates
* 11:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps1009.eqiad.wmnet
* 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf2004.codfw.wmnet
* 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
* 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
* 11:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
* 11:18 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
* 11:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
* 11:13 filippo@cumin1001: START - Cookbook sre.dns.netbox
* 11:09 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf2004.codfw.wmnet
* 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf1004.eqiad.wmnet
* 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
* 11:06 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
* 11:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1054.eqiad.wmnet with OS bullseye
* 11:02 filippo@cumin1001: START - Cookbook sre.dns.netbox
* 10:58 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf1004.eqiad.wmnet
* 10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 10:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 10:44 hnowlan: rebooting maps-eqiad for updates
* 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 10:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
* 10:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
* 10:17 claime: Restarted maintenance scripts on mwmaint1002.eqiad.wmnet
* 10:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
* 10:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 10:15 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet
* 10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet
* 10:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 10:06 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 10:05 claime: Stopping maintenance scripts on mwmaint1002.eqiad.wmnet for reboot
* 09:55 moritzm: installing ping3003 [[phab:T273509|T273509]]
* 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
* 09:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
* 09:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]]
* 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 08:26 moritzm: installing sudo security updates
* 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2118 [[phab:T327372|T327372]]', diff saved to https://phabricator.wikimedia.org/P43190 and previous config saved to /var/cache/conftool/dbconfig/20230119-060449-ladsgroup.json
* 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary [[phab:T327372|T327372]]', diff saved to https://phabricator.wikimedia.org/P43189 and previous config saved to /var/cache/conftool/dbconfig/20230119-060316-ladsgroup.json
* 06:02 Amir1: Starting s7 codfw failover from db2118 to db2121 - [[phab:T327372|T327372]]
* 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 [[phab:T327372|T327372]]', diff saved to https://phabricator.wikimedia.org/P43188 and previous config saved to /var/cache/conftool/dbconfig/20230119-054243-ladsgroup.json
* 05:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T327372|T327372]]
* 05:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 [[phab:T327372|T327372]]


==Archives==
== 2023-01-18 ==
See [[Server admin log/Archives]].
* 23:47 zabe: run populateCulComment.php on all group0 and group1 wikis # [[phab:T327290|T327290]]
* 23:42 cstone: civicrm upgraded from {{Gerrit|164270b0}} to {{Gerrit|f6093fb2}}
* 22:35 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G  - bking@cumin1001 - [[phab:T323646|T323646]]
* 22:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G  - bking@cumin1001 - [[phab:T323646|T323646]]
* 21:50 kindrobot: close UTC late backport window
* 21:50 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:881462{{!}}[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]] (duration: 10m 45s)
* 21:41 kindrobot@deploy1002: essexigyan and kindrobot: Backport for [[gerrit:881462{{!}}[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:39 kindrobot@deploy1002: Started scap: Backport for [[gerrit:881462{{!}}[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]]
* 21:36 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:881451{{!}}Bump English Wikipedia event logging from 0.5 to 1% (T326892)]], [[gerrit:881431{{!}}Legacy Vector is not a responsive skin (T327256)]] (duration: 13m 01s)
* 21:25 kindrobot@deploy1002: kindrobot and jdlrobson: Backport for [[gerrit:881451{{!}}Bump English Wikipedia event logging from 0.5 to 1% (T326892)]], [[gerrit:881431{{!}}Legacy Vector is not a responsive skin (T327256)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:23 kindrobot@deploy1002: Started scap: Backport for [[gerrit:881451{{!}}Bump English Wikipedia event logging from 0.5 to 1% (T326892)]], [[gerrit:881431{{!}}Legacy Vector is not a responsive skin (T327256)]]
* 21:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS bullseye
* 21:05 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS bullseye
* 21:03 kindrobot: start UTC late backport window
* 20:54 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
* 20:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
* 20:49 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
* 20:48 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
* 20:36 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS bullseye
* 20:35 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS bullseye
* 20:34 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 20:34 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS buster
* 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:52 bblack: db1129 and lvs1017: removed misconfigured IP address in wrong vlan from eno1 and /e/n/i
* 19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS buster
* 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
* 19:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
* 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
* 19:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
* 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS buster
* 18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
* 18:21 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:878927{{!}}Enable the REST API on test-wikidata (T324999)]] (duration: 09m 38s)
* 18:14 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for [[gerrit:878927{{!}}Enable the REST API on test-wikidata (T324999)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 18:12 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:878927{{!}}Enable the REST API on test-wikidata (T324999)]]
* 17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 17:44 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 560 hosts
* 17:44 jnuche@deploy1002: Installing scap version "4.33.0" for 560 hosts
* 17:42 jnuche@deploy1002: install-world aborted:  (duration: 07m 17s)
* 17:42 btullis@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
* 17:41 btullis@deploy1002: Installing scap version "4.33.0" for 1 hosts
* 17:35 jnuche@deploy1002: Installing scap version "4.33.0" for 561 hosts
* 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1037']
* 17:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
* 17:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1037']
* 17:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1036']
* 16:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1036']
* 16:45 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
* 16:45 jnuche@deploy1002: Installing scap version "4.33.0" for 1 hosts
* 16:39 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881023{{!}}[100%] English Wikipedia uses Vector 2022 skin]] (duration: 09m 27s)
* 16:31 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881023{{!}}[100%] English Wikipedia uses Vector 2022 skin]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:29 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881023{{!}}[100%] English Wikipedia uses Vector 2022 skin]]
* 16:20 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881022{{!}}[75%] English Wikipedia uses Vector 2022 skin (T326892)]] (duration: 09m 24s)
* 16:13 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881022{{!}}[75%] English Wikipedia uses Vector 2022 skin (T326892)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:11 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881022{{!}}[75%] English Wikipedia uses Vector 2022 skin (T326892)]]
* 16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 15:58 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881021{{!}}[50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)]] (duration: 08m 52s)
* 15:51 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881021{{!}}[50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:49 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881021{{!}}[50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)]]
* 15:44 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:881020{{!}}[25%] English Wikipedia uses Vector 2022 skin (T326892)]] (duration: 09m 06s)
* 15:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1052.eqiad.wmnet with OS bullseye
* 15:37 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:36 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:881020{{!}}[25%] English Wikipedia uses Vector 2022 skin (T326892)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:35 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:881020{{!}}[25%] English Wikipedia uses Vector 2022 skin (T326892)]]
* 15:31 urandom: re-enabling Cassandra hinted-handoff for codfw -- [[phab:T327001|T327001]]
* 15:29 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:879659{{!}}[10%] English Wikipedia uses Vector 2022 skin (T326892)]] (duration: 11m 30s)
* 15:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
* 15:19 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [[gerrit:879659{{!}}[10%] English Wikipedia uses Vector 2022 skin (T326892)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
* 15:17 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:879659{{!}}[10%] English Wikipedia uses Vector 2022 skin (T326892)]]
* 15:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:880921{{!}}Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)]] (duration: 09m 11s)
* 15:13 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
* 15:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1052.eqiad.wmnet with OS bullseye
* 15:06 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for [[gerrit:880921{{!}}Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:880921{{!}}Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)]]
* 15:04 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:880920{{!}}Revert gallery changes in 1.40.0-wmf.18 (T326990)]] (duration: 13m 04s)
* 15:01 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
* 14:57 moritzm: uploaded python-jose 3.3.0+dfsg-4~wmf11u1 to apt.wikmedia.org (needed by python-social-auth/Bitu)
* 14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for [[gerrit:880920{{!}}Revert gallery changes in 1.40.0-wmf.18 (T326990)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:51 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:880920{{!}}Revert gallery changes in 1.40.0-wmf.18 (T326990)]]
* 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:881045{{!}}Revert "Breaking upgrade: mapdata" (T327151)]] (duration: 10m 33s)
* 14:37 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wmde-fisch: Backport for [[gerrit:881045{{!}}Revert "Breaking upgrade: mapdata" (T327151)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:881045{{!}}Revert "Breaking upgrade: mapdata" (T327151)]]
* 14:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:879946{{!}}Write to cul_reason[_plaintext]_id everywhere (T233004)]] (duration: 19m 54s)
* 14:23 moritzm: installing mod-wsgi security updates
* 14:16 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Backport for [[gerrit:879946{{!}}Write to cul_reason[_plaintext]_id everywhere (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:14 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:879946{{!}}Write to cul_reason[_plaintext]_id everywhere (T233004)]]
* 13:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
* 13:16 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
* 12:20 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
* 11:54 volans: upgraded cumin on cumin1001 to 4.2.0-1+deb11u1
* 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
* 11:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
* 11:42 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 11:27 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
* 11:16 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 11:16 volans@cumin1001: START - Cookbook sre.network.cf
* 11:15 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 11:15 volans@cumin1001: START - Cookbook sre.network.cf
* 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1050.eqiad.wmnet with OS bullseye
* 11:11 volans@cumin2002: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
* 11:11 volans@cumin2002: START - Cookbook sre.network.cf
* 11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
* 11:10 volans@cumin1001: START - Cookbook sre.network.cf
* 11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
* 11:10 volans@cumin1001: START - Cookbook sre.network.cf
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43185 and previous config saved to /var/cache/conftool/dbconfig/20230118-110716-marostegui.json
* 10:59 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 10:59 volans@cumin1001: START - Cookbook sre.network.cf
* 10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
* 10:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43184 and previous config saved to /var/cache/conftool/dbconfig/20230118-105106-marostegui.json
* 10:49 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 10:48 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
* 10:43 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1050.eqiad.wmnet with OS bullseye
* 10:21 zabe@deploy1002: Finished scap: Backport for [[gerrit:881361{{!}}Start reading from cuc_comment_id from a few wikis (T233004)]] (duration: 09m 17s)
* 10:14 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:881361{{!}}Start reading from cuc_comment_id from a few wikis (T233004)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 10:12 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 10:12 zabe@deploy1002: Started scap: Backport for [[gerrit:881361{{!}}Start reading from cuc_comment_id from a few wikis (T233004)]]
* 09:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:49 godog: start migration from webperf1004 to arclamp1001 - [[phab:T319434|T319434]]
* 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
* 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
* 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
* 09:33 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
* 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
* 09:24 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]] (duration: 08m 20s)
* 09:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]]
* 08:54 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
* 08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe2010.codfw.wmnet
* 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=codfw
* 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift,name=codfw
* 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
* 08:30 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
* 07:56 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 02:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
* 02:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
* 02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
* 02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
* 01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
* 01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
* 01:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
* 01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
* 01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
* 01:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
* 01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
* 01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
* 00:28 zabe: enwiki: rename the "discretionary sanctions alert" tag to "contentious topics alert" # [[phab:T327118|T327118]]
* 00:26 zabe@deploy1002: Finished scap: Backport for [[gerrit:881030{{!}}Add script to rename a change tag in wmf prod (T327118)]] (duration: 08m 29s)
* 00:20 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:881030{{!}}Add script to rename a change tag in wmf prod (T327118)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 00:18 zabe@deploy1002: Started scap: Backport for [[gerrit:881030{{!}}Add script to rename a change tag in wmf prod (T327118)]]
* 00:08 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=180p.vp9.webm # [[phab:T312153|T312153]]
* 00:07 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=120p.vp9.webm # [[phab:T312153|T312153]]
 
== 2023-01-17 ==
* 23:51 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "User:Amire80/frg" "Movement Multilingual Termbase" "Zabe" "per request [[:phab:T327149{{!}}T327149]]" # [[phab:T327149|T327149]]
* 23:33 zabe@deploy1002: Finished scap: Backport for [[gerrit:880905{{!}}Start reading from cuc_comment_id on testwiki (T233004)]], [[gerrit:880904{{!}}Start reading from cuc_actor everywhere (T233004)]] (duration: 09m 58s)
* 23:25 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:880905{{!}}Start reading from cuc_comment_id on testwiki (T233004)]], [[gerrit:880904{{!}}Start reading from cuc_actor everywhere (T233004)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 23:24 zabe@deploy1002: Started scap: Backport for [[gerrit:880905{{!}}Start reading from cuc_comment_id on testwiki (T233004)]], [[gerrit:880904{{!}}Start reading from cuc_actor everywhere (T233004)]]
* 23:19 zabe@deploy1002: Finished scap: Backport for [[gerrit:881026{{!}}Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004)]], [[gerrit:880925{{!}}Revert "Add read new support for cu_log comment ID columns" (T327219)]] (duration: 11m 46s)
* 23:09 zabe@deploy1002: zabe and dreamyjazz and zabe: Backport for [[gerrit:881026{{!}}Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004)]], [[gerrit:880925{{!}}Revert "Add read new support for cu_log comment ID columns" (T327219)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 23:07 zabe@deploy1002: Started scap: Backport for [[gerrit:881026{{!}}Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004)]], [[gerrit:880925{{!}}Revert "Add read new support for cu_log comment ID columns" (T327219)]]
* 23:06 zabe@deploy1002: Finished scap: Backport for [[gerrit:880903{{!}}Stop writing to cul_user and cul_user_text everywhere (T233004)]], [[gerrit:880902{{!}}Start writing to rev_comment_id everywhere (T299954)]] (duration: 10m 29s)
* 22:57 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:880903{{!}}Stop writing to cul_user and cul_user_text everywhere (T233004)]], [[gerrit:880902{{!}}Start writing to rev_comment_id everywhere (T299954)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 22:55 zabe@deploy1002: Started scap: Backport for [[gerrit:880903{{!}}Stop writing to cul_user and cul_user_text everywhere (T233004)]], [[gerrit:880902{{!}}Start writing to rev_comment_id everywhere (T299954)]]
* 22:51 bblack: repooling codfw
* 22:48 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:881016{{!}}Make sticky header edit button default for all wikis (T324799)]] (duration: 10m 34s)
* 22:39 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for [[gerrit:881016{{!}}Make sticky header edit button default for all wikis (T324799)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 22:38 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:881016{{!}}Make sticky header edit button default for all wikis (T324799)]]
* 22:30 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=non-existent1001
* 22:27 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:880915{{!}}Resolve deprecations and type changes in elastica 7.3.0]], [[gerrit:880917{{!}}UpdateSuggesterIndex: Properly cleanup bad indices]] (duration: 09m 42s)
* 22:25 bblack: cp2031: restart ats-be
* 22:20 ebernhardson@deploy1002: ebernhardson and ebernhardson: Backport for [[gerrit:880915{{!}}Resolve deprecations and type changes in elastica 7.3.0]], [[gerrit:880917{{!}}UpdateSuggesterIndex: Properly cleanup bad indices]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 22:18 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:880915{{!}}Resolve deprecations and type changes in elastica 7.3.0]], [[gerrit:880917{{!}}UpdateSuggesterIndex: Properly cleanup bad indices]]
* 22:14 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:880533{{!}}Show edit button in sticky header for desktop-improvement wikis (T324799)]] (duration: 10m 43s)
* 22:05 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for [[gerrit:880533{{!}}Show edit button in sticky header for desktop-improvement wikis (T324799)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:04 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:880533{{!}}Show edit button in sticky header for desktop-improvement wikis (T324799)]]
* 21:54 ebernhardson: Finished scap: Backport for [[gerrit:880913{{!}}Table of contents Collapse/Expand not working (T327064)]]
* 21:54 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:881008{{!}}Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"]] (duration: 09m 20s)
* 21:52 zabe: zabe@mwmaint1002:~$ mwscript extensions/CheckUser/maintenance/populateCulComment.php --wiki testwiki
* 21:46 ebernhardson@deploy1002: ebernhardson and trainbranchbot: Backport for [[gerrit:881008{{!}}Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:44 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:881008{{!}}Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"]]
* 21:42 ebernhardson@deploy1002: Sync cancelled.
* 21:35 ebernhardson@deploy1002: ebernhardson and dreamyjazz: Backport for [[gerrit:879653{{!}}Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 21:34 ebernhardson: scap also backporting [[gerrit:880913{{!}}Table of contents Collapse/Expand not working (T327064)]]
* 21:34 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:879653{{!}}Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004)]]
* 21:29 ebernhardson@deploy1002: Finished scap: Backport for [[gerrit:880568{{!}}Enable Phonos on afwiktionary and arwiki (T324561)]] (duration: 12m 21s)
* 21:18 ebernhardson@deploy1002: ebernhardson and hmonroy: Backport for [[gerrit:880568{{!}}Enable Phonos on afwiktionary and arwiki (T324561)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 21:17 ebernhardson@deploy1002: Started scap: Backport for [[gerrit:880568{{!}}Enable Phonos on afwiktionary and arwiki (T324561)]]
* 21:00 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (had been left depooled from previous powercycle)
* 20:47 ryankemper: [WDQS] Depooled `wdqs1016`
* 20:25 herron: ran preferred-replica-election on kafka-logging codfw to clear replica imbalance
* 20:18 ryankemper: [WDQS] Restart blazegraph on `wdqs1016` to clear alert: `ryankemper@wdqs1016:~$ sudo systemctl restart wdqs-blazegraph`
* 20:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]]
* 20:04 eileen: config revision changed from {{Gerrit|2e5cee3c}} to {{Gerrit|7425df0b}}
* 19:50 ryankemper: [[phab:T327175|T327175]] Reprocessing last several hours of updates (`2023-01-17T12:00:00Z` -> `2023-01-17T17:30:00Z`) on codfw elasticsearch, running on `ryankemper@mwmaint2002` tmux session `reindex`
* 19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 19:41 zabe@deploy1002: Finished scap: Backport for [[gerrit:880916{{!}}Revert "Revert "Enable visual enhancements on all talk namespaces""]] (duration: 10m 25s)
* 19:32 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:880916{{!}}Revert "Revert "Enable visual enhancements on all talk namespaces""]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 19:30 zabe@deploy1002: Started scap: Backport for [[gerrit:880916{{!}}Revert "Revert "Enable visual enhancements on all talk namespaces""]]
* 18:48 zabe@deploy1002: Finished scap: Backport for [[gerrit:880914{{!}}Revert "Enable visual enhancements on all talk namespaces"]] (duration: 09m 08s)
* 18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
* 18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
* 18:41 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:880914{{!}}Revert "Enable visual enhancements on all talk namespaces"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 18:39 zabe@deploy1002: Started scap: Backport for [[gerrit:880914{{!}}Revert "Enable visual enhancements on all talk namespaces"]]
* 18:39 zabe@deploy1002: backport aborted:  (duration: 00m 26s)
* 18:35 zabe@deploy1002: backport aborted:  (duration: 19m 41s)
* 18:29 otto@deploy1002: Finished deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac] (duration: 04m 28s)
* 18:29 otto@deploy1002: Finished deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919] (duration: 00m 15s)
* 18:29 otto@deploy1002: Started deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919]
* 18:25 otto@deploy1002: Started deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac]
* {{safesubst:SAL entry|1=18:25 zabe@deploy1002: zabe and matmarex and zabe: Backport for [[gerrit:880908{{!}}objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158)]], [[gerrit:878169{{!}}Use new DiscussionTools heading markup on enwiki (T314714)]], [[gerrit:879158{{!}}Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955)]], [[gerrit:879159{{!}}Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907)]], [[}}
* {{safesubst:SAL entry|1=18:23 zabe@deploy1002: Started scap: Backport for [[gerrit:880908{{!}}objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158)]], [[gerrit:878169{{!}}Use new DiscussionTools heading markup on enwiki (T314714)]], [[gerrit:879158{{!}}Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955)]], [[gerrit:879159{{!}}Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907)]], [[gerrit:879103{{!}}}}
* 18:13 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 18:10 mutante: gerrit1002/gerrit2002: sudo rmdir /srv/gerrit/jvmlogs
* 18:07 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 18:07 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 18:05 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
* 18:01 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-wikikube-rw,name=codfw
* 17:58 jynus: restarted es5 codfw backup
* 17:54 bblack: authdns1001: restart confd
* 17:27 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=aqs,name=codfw
* 17:19 effie: pooling back codfw services
* 17:17 bblack: removing errant 2620:0:860:118: IPs from primary interfaces of hosts in B2
* 17:01 effie: restarting confd on deploy1002
* 16:59 effie: pooling back depooled mw servers in codfw
* 16:44 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
* 16:44 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
* 16:32 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1_amd64.changes: [[phab:T325557|T325557]]
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43179 and previous config saved to /var/cache/conftool/dbconfig/20230117-162100-ladsgroup.json
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43178 and previous config saved to /var/cache/conftool/dbconfig/20230117-160555-ladsgroup.json
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43177 and previous config saved to /var/cache/conftool/dbconfig/20230117-155050-ladsgroup.json
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43175 and previous config saved to /var/cache/conftool/dbconfig/20230117-153545-ladsgroup.json
* 15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 14:56 urandom: truncating hints for Cassandra nodes in codfw row b -- [[phab:T327001|T327001]]
* 14:52 urandom: disabling Cassandra hinted-handoff for codfw  -- [[phab:T327001|T327001]]
* 14:27 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 14:26 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
* 14:12 _joe_: try to restart cassandra-a on aqs2005
* 13:37 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=codfw
* 13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=codfw
* 13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=codfw
* 13:27 jynus: restarting manually replication on es2020, may require data check afterwards
* 13:26 _joe_: depooling all services in codfw
* 13:19 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool mobileapps in codfw: maintenance
* 13:15 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
* 13:14 oblivian@cumin1001: START - Cookbook sre.discovery.service-route depool mobileapps in codfw: maintenance
* 13:13 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check citoid: maintenance
* 13:13 oblivian@cumin1001: START - Cookbook sre.discovery.service-route check citoid: maintenance
* 13:08 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
* 13:01 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
* 13:01 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=.*
* 12:35 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
* 12:35 moritzm: installing ipython security updates
* 11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1048.eqiad.wmnet with OS bullseye
* 11:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
* 11:16 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
* 11:08 volans: upgraded cumin on cumin2002 to 4.2.0-1+deb11u1
* 11:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1048.eqiad.wmnet with OS bullseye
* 10:16 godog: restart opensearch_2@production-elk7-eqiad.service on logstash102[34]
* 10:12 jnuche@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
* 10:11 jnuche@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]] (duration: 42m 26s)
* 09:42 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9568478]: (no justification provided) (duration: 00m 12s)
* 09:42 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9568478]: (no justification provided)
* 09:28 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]]
* 09:26 jnuche@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/home/jnuche/scap-image-build-and-push-log' (duration: 00m 50s)
* 09:26 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19  refs [[phab:T325582|T325582]]
* 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 08:47 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:879652{{!}}Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)]] (duration: 13m 50s)
* 08:35 ladsgroup@deploy1002: ladsgroup and dreamyjazz: Backport for [[gerrit:879652{{!}}Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 08:33 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:879652{{!}}Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)]]
* 08:29 kartik@deploy1002: Finished scap: Backport for [[gerrit:879998{{!}}testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)]] (duration: 20m 56s)
* 08:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=zhwiki --namespaceName='USER_TALK' # [[phab:T327146|T327146]]
* 08:13 kartik@deploy1002: kartik and kartik: Backport for [[gerrit:879998{{!}}testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:08 kartik@deploy1002: Started scap: Backport for [[gerrit:879998{{!}}testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)]]
* 07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43168 and previous config saved to /var/cache/conftool/dbconfig/20230117-075222-ladsgroup.json
* 07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43167 and previous config saved to /var/cache/conftool/dbconfig/20230117-073717-ladsgroup.json
* 07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43166 and previous config saved to /var/cache/conftool/dbconfig/20230117-072212-ladsgroup.json
* 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43165 and previous config saved to /var/cache/conftool/dbconfig/20230117-070707-ladsgroup.json
* 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1173 [[phab:T326134|T326134]]', diff saved to https://phabricator.wikimedia.org/P43164 and previous config saved to /var/cache/conftool/dbconfig/20230117-070532-ladsgroup.json
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1131 to s6 primary and set section read-write [[phab:T326134|T326134]]', diff saved to https://phabricator.wikimedia.org/P43163 and previous config saved to /var/cache/conftool/dbconfig/20230117-070102-ladsgroup.json
* 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - [[phab:T326134|T326134]]', diff saved to https://phabricator.wikimedia.org/P43162 and previous config saved to /var/cache/conftool/dbconfig/20230117-070035-ladsgroup.json
* 07:00 Amir1: Starting s6 eqiad failover from db1173 to db1131 - [[phab:T326134|T326134]]
* 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 [[phab:T326134|T326134]]', diff saved to https://phabricator.wikimedia.org/P43160 and previous config saved to /var/cache/conftool/dbconfig/20230117-060710-ladsgroup.json
* 06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T326134|T326134]]
* 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 [[phab:T326134|T326134]]
 
== 2023-01-16 ==
* 17:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
* 17:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
* 17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 17:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 16:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 16:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1044.eqiad.wmnet with OS bullseye
* 16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
* 16:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
* 16:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1044.eqiad.wmnet with OS bullseye
* 16:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1042.eqiad.wmnet with OS bullseye
* 16:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
* 15:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
* 15:47 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1042.eqiad.wmnet with OS bullseye
* 13:35 XioNoX: disable one of 3 cr1-cr2 eqiad links - [[phab:T304712|T304712]]
* 13:34 XioNoX: repool eqiad-eqord link - [[phab:T304712|T304712]]
* 12:56 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
* 12:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
* 12:50 XioNoX: drain eqiad-eqord link - [[phab:T304712|T304712]]
* 12:47 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
* 12:43 Amir1: power cycled db1198
* 12:36 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
* 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[5-9].eqiad.wmnet
* 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102[012].eqiad.wmnet
* 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102.eqiad.wmnet
* 12:05 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
* 12:02 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
* 11:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:32 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 11:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
* 10:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:48 moritzm: installing libtasn1-6 security updates on Bullseye
* 10:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
* 08:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
* 08:46 elukey: powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen
* 08:14 oblivian@deploy1002: Synchronized README: test null deployment for [[phab:T327041|T327041]] (duration: 07m 12s)
* 08:09 Emperor: stopped swift_rclone_sync on ms-be1069
* 07:48 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse20(0[6-9]{{!}}10).codfw.wmnet
* 07:44 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw23([12][0-9]{{!}}3[0-4]).codfw.wmnet
* 07:41 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw22(59{{!}}6[0-9]{{!}}70).codfw.wmnet
* 07:26 _joe_: restarting pybal on lvs2009
* 07:10 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=(mw.*{{!}}appservers{{!}}api)-ro,name=codfw
* 07:10 _joe_: depooling mediawiki in codfw
* 06:47 XioNoX: add 2001:67c:930::/48 to network:external in data.yaml
* 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
* 06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
* 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1198 maint', diff saved to https://phabricator.wikimedia.org/P43157 and previous config saved to /var/cache/conftool/dbconfig/20230116-062211-ladsgroup.json
* 02:25 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=parsoid-php
* 02:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,service=nginx
* 02:01 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,service=nginx
* 01:51 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet
* 01:35 Amir1: rolling restart of php-fpm across the fleet
* 01:30 thcipriani: 01:29:56 php-fpm-restart: 100% (in-flight: 0; ok: 184; fail: 112; left: 0)
* 01:29 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879798{{!}}LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)]] (duration: 24m 47s)
* 01:15 thcipriani@deploy1002: thcipriani and func: Backport for [[gerrit:879798{{!}}LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 01:05 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879798{{!}}LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)]]
 
== 2023-01-14 ==
* 09:46 godog: issue 'request system reboot member 2' - [[phab:T327001|T327001]]
* 09:20 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
* 09:19 Emperor: depool thanos-fe2002 [[phab:T327001|T327001]]
* 09:19 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=ms-fe2010.codfw.wmnet
* 09:19 Emperor: depool ms-fe2010 [[phab:T327001|T327001]]
 
== 2023-01-13 ==
* 23:39 mutante: people2002 - systemctl reset-failed after removing auto_restart_rsync timers
* 22:26 mutante: mirror1001 - systemctl start update-ubuntu-mirror (sometimes sync fails)
* 20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1011']
* 20:58 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
* 20:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1011']
* 20:49 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
* 20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['druid1011']
* 20:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
* 20:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1010']
* 20:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
* 20:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1010']
* 20:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
* 20:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
* 20:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1009']
* 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
* 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
* 20:04 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict2001.codfw.wmnet
* 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bullseye
* 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict2001.codfw.wmnet on all recursors
* 19:54 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache aphlict2001.codfw.wmnet on all recursors
* 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:54 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
* 19:52 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
* 19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 19:49 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host aphlict2001.codfw.wmnet
* 19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bullseye
* 19:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
* 19:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
* 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
* 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
* 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bullseye
* 18:25 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Green Giant" "Cromium" # [[phab:T298707|T298707]]
* 17:34 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879793{{!}}TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)]] (duration: 13m 25s)
* 17:22 thcipriani@deploy1002: thcipriani and abi: Backport for [[gerrit:879793{{!}}TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 17:20 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879793{{!}}TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)]]
* 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1004.eqiad.wmnet with OS bullseye
* 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 15:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 15:24 jynus: restarted again update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
* 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
* 15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bullseye
* 15:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1003.eqiad.wmnet with OS bullseye
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 15:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
* 15:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
* 15:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1004.eqiad.wmnet with OS bullseye
* 14:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
* 14:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
* 14:49 volans: uploaded cumin_4.2.0 to apt.wikimedia.org bullseye-wikimedia
* 14:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1003.eqiad.wmnet with OS bullseye
* 12:48 moritzm: installing bast6002 [[phab:T324974|T324974]]
* 12:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
* 12:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastions - jmm@cumin2002"
* 11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastions - jmm@cumin2002"
* 10:53 moritzm: installing bast5003 [[phab:T324974|T324974]]
* 10:49 jynus: restarting update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
* 09:41 moritzm: installing bast4004 [[phab:T324974|T324974]]
* 09:06 moritzm: installing bast3006 [[phab:T324974|T324974]]
* 02:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 01:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1002']
* 01:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
* 01:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1001']
* 01:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
* 01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1002']
* 01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1001']
* 01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
* 01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
* 01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1004']
* 01:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
* 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1003']
* 01:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
* 00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1004']
* 00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1003']
* 00:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
* 00:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
* 00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:15 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED
 
== 2023-01-12 ==
* 23:53 zabe: start running cuc_comment_id population script on rest of sections in screens with --sleep 2 # [[phab:T233004|T233004]]
* 23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 23:13 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3 (duration: 02m 31s)
* 23:10 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3
* 23:08 sbassett: Deployed (temporary) security mitigations for [[phab:T326691|T326691]]
* 22:45 mutante: people2002 - apt-get remove --purge rsync
* 22:08 zabe: start of "foreachwikiindblist s3.dblist extensions/CheckUser/maintenance/populateCucComment.php" in a screen in mwmaint1002 # [[phab:T233004|T233004]]
* 22:07 thcipriani: end UTC late backport
* 22:06 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879161{{!}}cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757)]], [[gerrit:862343{{!}}cirrus: Disable incoming link counting (T317023)]] (duration: 09m 23s)
* 21:59 krinkle@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 08s)
* 21:59 krinkle@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
* 21:59 Krinkle: krinkle@deploy1002$ `scap install-world -v --limit-hosts` for webperf1003.eqiad and webperf2003.codfw, ref [[phab:T326668|T326668]]
* 21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
* 21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
* 21:58 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for [[gerrit:879161{{!}}cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757)]], [[gerrit:862343{{!}}cirrus: Disable incoming link counting (T317023)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
* 21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
* 21:57 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879161{{!}}cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757)]], [[gerrit:862343{{!}}cirrus: Disable incoming link counting (T317023)]]
* 21:56 zabe: run populateCucComment.php on testwiki # [[phab:T233004|T233004]]
* 21:48 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879600{{!}}nlwiki: Add block right to checkuser group (T326355)]] (duration: 09m 04s)
* 21:41 thcipriani@deploy1002: thcipriani and stang: Backport for [[gerrit:879600{{!}}nlwiki: Add block right to checkuser group (T326355)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 21:39 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879600{{!}}nlwiki: Add block right to checkuser group (T326355)]]
* 21:37 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879571{{!}}looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)]] (duration: 09m 10s)
* 21:30 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for [[gerrit:879571{{!}}looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:28 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879571{{!}}looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)]]
* 21:27 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:879561{{!}}etwikiquote: Switch logo variant back (T313698)]] (duration: 09m 25s)
* 21:21 ejegg: restarted fundraising scheduled jobs
* 21:19 ejegg: civicrm upgraded from {{Gerrit|9afd2789}} to {{Gerrit|7ecb5038}}
* 21:19 thcipriani@deploy1002: thcipriani and stang: Backport for [[gerrit:879561{{!}}etwikiquote: Switch logo variant back (T313698)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 21:17 thcipriani@deploy1002: Started scap: Backport for [[gerrit:879561{{!}}etwikiquote: Switch logo variant back (T313698)]]
* 21:16 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:868816{{!}}Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)]] (duration: 10m 43s)
* 21:07 thcipriani@deploy1002: thcipriani and samwilson: Backport for [[gerrit:868816{{!}}Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 21:05 thcipriani@deploy1002: Started scap: Backport for [[gerrit:868816{{!}}Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)]]
* 20:43 ejegg: rolled back CiviCRM to {{Gerrit|9afd2789}}
* 20:31 ejegg: civicrm upgraded from {{Gerrit|9afd2789}} to {{Gerrit|7ecb5038}}
* 20:29 ejegg: disabled fundraising scheduled jobs for civi deploy
* 20:08 brett: Setting thread_pool_max for varnish-frontend to 12000
* 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43148 and previous config saved to /var/cache/conftool/dbconfig/20230112-195922-marostegui.json
* 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43147 and previous config saved to /var/cache/conftool/dbconfig/20230112-195651-marostegui.json
* 19:55 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 (mariadb 11) to dbctl, depooled [[phab:T326116|T326116]]', diff saved to https://phabricator.wikimedia.org/P43146 and previous config saved to /var/cache/conftool/dbconfig/20230112-195514-marostegui.json
* 19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.18  refs [[phab:T325581|T325581]]
* 18:36 mutante: stat1008 - systemctl reset-failed  - clears Icinga alerts from failed things of the past
* 18:35 mutante: stat1007 - systemctl reset-failed  - clears Icinga alerts
* 18:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
* 18:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
* 17:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
* 17:45 mutante: powercycling mc2040 via mgmt ocnsole
* 17:34 ejegg: civicrm rolled back from {{Gerrit|7ecb5038}} to {{Gerrit|9afd2789}}
* 17:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
* 17:08 btullis@cumin1001: Added views for new wiki: aswikiquote [[phab:T321294|T321294]]
* 17:05 ejegg: civicrm upgraded from {{Gerrit|9afd2789}} to {{Gerrit|7ecb5038}}
* 16:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
* 16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 16:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 16:43 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
* 16:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 16:31 zabe@deploy1002: Finished scap: Backport for [[gerrit:879590{{!}}Stop writing to cul_user and cul_user_text on a few wikis (T233004)]], [[gerrit:879591{{!}}Start writing to rev_comment_id on group1 wikis (T299954)]] (duration: 09m 49s)
* 16:23 zabe@deploy1002: zabe and zabe: Backport for [[gerrit:879590{{!}}Stop writing to cul_user and cul_user_text on a few wikis (T233004)]], [[gerrit:879591{{!}}Start writing to rev_comment_id on group1 wikis (T299954)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:21 zabe@deploy1002: Started scap: Backport for [[gerrit:879590{{!}}Stop writing to cul_user and cul_user_text on a few wikis (T233004)]], [[gerrit:879591{{!}}Start writing to rev_comment_id on group1 wikis (T299954)]]
* 16:14 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync