You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32369 and previous config saved to /var/cache/conftool/dbconfig/20220812-010312-ladsgroup.json)
imported>Stashbot
(jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye)
 
(267 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2022-08-12 ==
== 2023-06-09 ==
* 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32369 and previous config saved to /var/cache/conftool/dbconfig/20220812-010312-ladsgroup.json
* 21:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
* 01:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 21:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
* 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
* 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 20:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
* 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 20:38 btullis@cumin1001: END (ERROR) - Cookbook sre.aqs.roll-restart-reboot (exit_code=97) rolling restart_daemons on A:aqs
* 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32368 and previous config saved to /var/cache/conftool/dbconfig/20220812-010233-ladsgroup.json
* 20:23 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs
* 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32367 and previous config saved to /var/cache/conftool/dbconfig/20220812-004727-ladsgroup.json
* 17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS bullseye
* 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32366 and previous config saved to /var/cache/conftool/dbconfig/20220812-003221-ladsgroup.json
* 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
* 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32365 and previous config saved to /var/cache/conftool/dbconfig/20220812-001715-ladsgroup.json
* 17:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49398 and previous config saved to /var/cache/conftool/dbconfig/20230609-173202-ladsgroup.json
* 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P49397 and previous config saved to /var/cache/conftool/dbconfig/20230609-171656-ladsgroup.json
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P49396 and previous config saved to /var/cache/conftool/dbconfig/20230609-170150-ladsgroup.json
* 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49395 and previous config saved to /var/cache/conftool/dbconfig/20230609-164644-ladsgroup.json
* 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49394 and previous config saved to /var/cache/conftool/dbconfig/20230609-163007-ladsgroup.json
* 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49393 and previous config saved to /var/cache/conftool/dbconfig/20230609-162946-ladsgroup.json
* 16:20 urandom: powercycling restbase1028
* 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P49392 and previous config saved to /var/cache/conftool/dbconfig/20230609-161440-ladsgroup.json
* 16:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host snapshot1017.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['snapshot1016']
* 16:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['snapshot1016']
* 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P49391 and previous config saved to /var/cache/conftool/dbconfig/20230609-155934-ladsgroup.json
* 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host snapshot1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49390 and previous config saved to /var/cache/conftool/dbconfig/20230609-154428-ladsgroup.json
* 15:30 andrewbogott: wikitech-static: deleted everything in /srv/mediawiki/images/wikitech/archive for [[phab:T338520|T338520]]
* 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49388 and previous config saved to /var/cache/conftool/dbconfig/20230609-152845-ladsgroup.json
* 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49387 and previous config saved to /var/cache/conftool/dbconfig/20230609-152824-ladsgroup.json
* 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host snapshot1017.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host snapshot1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for snapshot101[6-7] - pt1979@cumin2002"
* 15:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for snapshot101[6-7] - pt1979@cumin2002"
* 15:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P49386 and previous config saved to /var/cache/conftool/dbconfig/20230609-151318-ladsgroup.json
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P49385 and previous config saved to /var/cache/conftool/dbconfig/20230609-145812-ladsgroup.json
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49384 and previous config saved to /var/cache/conftool/dbconfig/20230609-144305-ladsgroup.json
* 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49383 and previous config saved to /var/cache/conftool/dbconfig/20230609-142731-ladsgroup.json
* 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49382 and previous config saved to /var/cache/conftool/dbconfig/20230609-142655-ladsgroup.json
* 14:14 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
* 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P49381 and previous config saved to /var/cache/conftool/dbconfig/20230609-141149-ladsgroup.json
* 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P49380 and previous config saved to /var/cache/conftool/dbconfig/20230609-135643-ladsgroup.json
* 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49379 and previous config saved to /var/cache/conftool/dbconfig/20230609-134137-ladsgroup.json
* 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
* 13:29 sukhe: start pybal on lvs2013
* 13:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
* 13:25 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
* 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49378 and previous config saved to /var/cache/conftool/dbconfig/20230609-132541-ladsgroup.json
* 13:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 13:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49377 and previous config saved to /var/cache/conftool/dbconfig/20230609-132520-ladsgroup.json
* 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P49376 and previous config saved to /var/cache/conftool/dbconfig/20230609-131014-ladsgroup.json
* 13:07 sukhe: stop pybal on lvs2013 to test lvs2014
* 13:02 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs2014
* 13:02 sukhe: sudo cumin 'A:lvs and A:codfw' 'enable-puppet "CR 928818"'
* 13:01 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
* 12:59 sukhe: sudo cumin 'A:lvs and A:codfw' 'disable-puppet "CR 928818"'
* 12:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2014
* 12:57 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
* 12:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2014
* 12:55 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
* 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P49373 and previous config saved to /var/cache/conftool/dbconfig/20230609-125508-ladsgroup.json
* 12:50 krinkle@deploy1002: Finished scap: {{Gerrit|I385d28d2edacb37}} (duration: 06m 59s)
* 12:43 krinkle@deploy1002: Started scap: {{Gerrit|I385d28d2edacb37}}
* 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49371 and previous config saved to /var/cache/conftool/dbconfig/20230609-124002-ladsgroup.json
* 12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-add DNS for cloud-hosts-codfw vlan. - cmooney@cumin1001"
* 12:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-add DNS for cloud-hosts-codfw vlan. - cmooney@cumin1001"
* 12:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49370 and previous config saved to /var/cache/conftool/dbconfig/20230609-122303-ladsgroup.json
* 12:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49369 and previous config saved to /var/cache/conftool/dbconfig/20230609-122243-ladsgroup.json
* 12:16 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:16 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2003-dev - aborrero@cumin2002"
* 12:15 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2003-dev - aborrero@cumin2002"
* 12:13 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P49368 and previous config saved to /var/cache/conftool/dbconfig/20230609-120737-ladsgroup.json
* 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Fsero out of all services on: 778 hosts
* 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P49367 and previous config saved to /var/cache/conftool/dbconfig/20230609-115230-ladsgroup.json
* 11:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Fsero out of all services on: 778 hosts
* 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Fsero out of all services on: 1262 hosts
* 11:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Fsero out of all services on: 1262 hosts
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49366 and previous config saved to /var/cache/conftool/dbconfig/20230609-113724-ladsgroup.json
* 11:27 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49365 and previous config saved to /var/cache/conftool/dbconfig/20230609-112250-ladsgroup.json
* 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49364 and previous config saved to /var/cache/conftool/dbconfig/20230609-112229-ladsgroup.json
* 11:20 sukhe: pcc-db1001: sudo systemctl start pcc_facts_processor.service
* 11:14 sukhe: sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P49363 and previous config saved to /var/cache/conftool/dbconfig/20230609-110723-ladsgroup.json
* 11:02 sukhe: homer "cr*-codfw*" commit "Gerrit: 928113 add new LVS host lvs2014
* 10:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2014.codfw.wmnet with OS bullseye
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P49362 and previous config saved to /var/cache/conftool/dbconfig/20230609-105217-ladsgroup.json
* 10:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49361 and previous config saved to /var/cache/conftool/dbconfig/20230609-103711-ladsgroup.json
* 10:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49360 and previous config saved to /var/cache/conftool/dbconfig/20230609-102217-ladsgroup.json
* 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49359 and previous config saved to /var/cache/conftool/dbconfig/20230609-102156-ladsgroup.json
* 10:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
* 10:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 10:12 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 10:09 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 10:08 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P49358 and previous config saved to /var/cache/conftool/dbconfig/20230609-100650-ladsgroup.json
* 09:57 elukey: increase <nowiki>{</nowiki>eqiad,codfw<nowiki>}</nowiki>.change-prop.transcludes.resource-change topic partitions (3->5) on kafka main clusters - [[phab:T338357|T338357]]
* 09:56 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 09:54 moritzm: installing jupyter-core security updates on bullseye
* 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P49357 and previous config saved to /var/cache/conftool/dbconfig/20230609-095144-ladsgroup.json
* 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49356 and previous config saved to /var/cache/conftool/dbconfig/20230609-093638-ladsgroup.json
* 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49355 and previous config saved to /var/cache/conftool/dbconfig/20230609-092141-ladsgroup.json
* 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 09:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 09:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49354 and previous config saved to /var/cache/conftool/dbconfig/20230609-090829-ladsgroup.json
* 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P49353 and previous config saved to /var/cache/conftool/dbconfig/20230609-085322-ladsgroup.json
* 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P49352 and previous config saved to /var/cache/conftool/dbconfig/20230609-083816-ladsgroup.json
* 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49351 and previous config saved to /var/cache/conftool/dbconfig/20230609-082310-ladsgroup.json
* 08:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49350 and previous config saved to /var/cache/conftool/dbconfig/20230609-080708-ladsgroup.json
* 08:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49349 and previous config saved to /var/cache/conftool/dbconfig/20230609-080637-ladsgroup.json
* 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P49348 and previous config saved to /var/cache/conftool/dbconfig/20230609-075130-ladsgroup.json
* 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P49347 and previous config saved to /var/cache/conftool/dbconfig/20230609-073624-ladsgroup.json
* 07:33 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1492.eqiad.wmnet
* 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49346 and previous config saved to /var/cache/conftool/dbconfig/20230609-072118-ladsgroup.json
* 07:19 moritzm: powercycling restbase2018 (kernel hung following what looks like I/O errors)
* 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49345 and previous config saved to /var/cache/conftool/dbconfig/20230609-070520-ladsgroup.json
* 07:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 07:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49344 and previous config saved to /var/cache/conftool/dbconfig/20230609-070459-ladsgroup.json
* 06:50 moritzm: installing wireshark security updates
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P49343 and previous config saved to /var/cache/conftool/dbconfig/20230609-064953-ladsgroup.json
* 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: puppetmaster2005.codfw.wmnet
* 06:49 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: puppetmaster2005.codfw.wmnet
* 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: puppetmaster1005.eqiad.wmnet
* 06:49 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: puppetmaster1005.eqiad.wmnet
* 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus3001.esams.wmnet
* 06:48 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus3001.esams.wmnet
* 06:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
* 06:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
* 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P49342 and previous config saved to /var/cache/conftool/dbconfig/20230609-063447-ladsgroup.json
* 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49341 and previous config saved to /var/cache/conftool/dbconfig/20230609-061941-ladsgroup.json
* 06:06 eileen: config {{Gerrit|97c57848}} -> {{Gerrit|6f4a9d19}}  restart jobs
* 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49340 and previous config saved to /var/cache/conftool/dbconfig/20230609-060438-ladsgroup.json
* 06:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 06:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 05:53 eileen: civicrm upgraded from {{Gerrit|158896cc}} to {{Gerrit|5bbed553}}
* 05:52 eileen: config revision changed from {{Gerrit|8b71fa7a}} to {{Gerrit|97c57848}}
* 05:50 moritzm: installing cpio security updates
* 05:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 05:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 05:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49339 and previous config saved to /var/cache/conftool/dbconfig/20230609-052315-ladsgroup.json
* 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P49338 and previous config saved to /var/cache/conftool/dbconfig/20230609-050809-ladsgroup.json
* 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P49337 and previous config saved to /var/cache/conftool/dbconfig/20230609-045302-ladsgroup.json
* 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49336 and previous config saved to /var/cache/conftool/dbconfig/20230609-043756-ladsgroup.json
* 04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49335 and previous config saved to /var/cache/conftool/dbconfig/20230609-042306-ladsgroup.json
* 04:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
* 04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
* 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49334 and previous config saved to /var/cache/conftool/dbconfig/20230609-042246-ladsgroup.json
* 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P49333 and previous config saved to /var/cache/conftool/dbconfig/20230609-040739-ladsgroup.json
* 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P49332 and previous config saved to /var/cache/conftool/dbconfig/20230609-035233-ladsgroup.json
* 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49331 and previous config saved to /var/cache/conftool/dbconfig/20230609-033727-ladsgroup.json
* 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49330 and previous config saved to /var/cache/conftool/dbconfig/20230609-032127-ladsgroup.json
* 03:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
* 03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
* 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49329 and previous config saved to /var/cache/conftool/dbconfig/20230609-032106-ladsgroup.json
* 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P49328 and previous config saved to /var/cache/conftool/dbconfig/20230609-030600-ladsgroup.json
* 02:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P49327 and previous config saved to /var/cache/conftool/dbconfig/20230609-025054-ladsgroup.json
* 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49326 and previous config saved to /var/cache/conftool/dbconfig/20230609-023548-ladsgroup.json
* 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49325 and previous config saved to /var/cache/conftool/dbconfig/20230609-022054-ladsgroup.json
* 02:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
* 02:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
* 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49324 and previous config saved to /var/cache/conftool/dbconfig/20230609-022034-ladsgroup.json
* 02:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1002.eqiad.wmnet with OS bullseye
* 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P49323 and previous config saved to /var/cache/conftool/dbconfig/20230609-020528-ladsgroup.json
* 02:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
* 02:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
* 02:00 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
* 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P49322 and previous config saved to /var/cache/conftool/dbconfig/20230609-015021-ladsgroup.json
* 01:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
* 01:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
* 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49321 and previous config saved to /var/cache/conftool/dbconfig/20230609-013515-ladsgroup.json
* 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS bullseye
* 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49320 and previous config saved to /var/cache/conftool/dbconfig/20230609-011945-ladsgroup.json
* 01:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
* 01:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
* 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49319 and previous config saved to /var/cache/conftool/dbconfig/20230609-011924-ladsgroup.json
* 01:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P49318 and previous config saved to /var/cache/conftool/dbconfig/20230609-010418-ladsgroup.json
* 00:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
* 00:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
* 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
* 00:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
* 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P49317 and previous config saved to /var/cache/conftool/dbconfig/20230609-004912-ladsgroup.json
* 00:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
* 00:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
* 00:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS bullseye
* 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49316 and previous config saved to /var/cache/conftool/dbconfig/20230609-003406-ladsgroup.json
* 00:31 eileen: civicrm upgraded from {{Gerrit|6f64e77d}} to {{Gerrit|158896cc}}
* 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki-root1002']
* 00:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki-root1002']
* 00:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki-root1002']
* 00:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki-root1002']
* 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49315 and previous config saved to /var/cache/conftool/dbconfig/20230609-001821-ladsgroup.json
* 00:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
* 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49314 and previous config saved to /var/cache/conftool/dbconfig/20230609-001732-ladsgroup.json
* 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P49313 and previous config saved to /var/cache/conftool/dbconfig/20230609-000226-ladsgroup.json


== 2022-08-11 ==
== 2023-06-08 ==
* 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:55 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
* 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
* 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P49312 and previous config saved to /var/cache/conftool/dbconfig/20230608-234720-ladsgroup.json
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:42 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad]
* 23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for pki-root - pt1979@cumin2002"
* 23:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add


== 2022-08-10 ==
== 2023-06-07 ==
* 21:25 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1016.eqiad.wmnet
* 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49219 and previous config saved to /var/cache/conftool/dbconfig/20230607-235624-ladsgroup.json
* 21:23 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
* 23:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 21:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: [[phab:T309810|T309810]]
* 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: [[phab:T309810|T309810]]
* 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49218 and previous config saved to /var/cache/conftool/dbconfig/20230607-235603-ladsgroup.json
* 21:09 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P49217 and previous config saved to /var/cache/conftool/dbconfig/20230607-234522-ladsgroup.json
* 21:09 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: [[phab:T309810|T309810]]
* 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P49216 and previous config saved to /var/cache/conftool/dbconfig/20230607-234057-ladsgroup.json
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49215 and previous config saved to /var/cache/conftool/dbconfig/20230607-233016-ladsgroup.json
* 21:00 cjming: end of UTC late backport
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P49214 and previous config saved to /var/cache/conftool/dbconfig/20230607-232551-ladsgroup.json
* 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P49213 and previous config saved to /var/cache/conftool/dbconfig/20230607-232223-ladsgroup.json
* 23:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 23:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org


== 2022-08-09 ==
== 2023-06-06 ==
* 23:17 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1011.eqiad.wmnet
* 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48961 and previous config saved to /var/cache/conftool/dbconfig/20230606-235248-ladsgroup.json
* 23:07 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48960 and previous config saved to /var/cache/conftool/dbconfig/20230606-234810-ladsgroup.json
* 23:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 23:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
* 22:51 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48959 and previous config saved to /var/cache/conftool/dbconfig/20230606-233742-ladsgroup.json
* 22:51 bking@deploy1002: helmfile [codfw] START helmfile.d/services/
* 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48958 and previous config saved to /var/cache/conftool/dbconfig/20230606-233304-ladsgroup.json
* 23:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
* 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48955 and previous config saved to /var/cache/conftool/dbconfig/20230606-232235-ladsgroup.json
* 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox


== 2022-08-08 ==
== 2023-06-05 ==
* 23:52 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: clean up testwiki experiments [[phab:T314750|T314750]] (duration: 03m 19s)
* 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48812 and previous config saved to /var/cache/conftool/dbconfig/20230605-235346-ladsgroup.json
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 23:46 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: clean up testwiki experiments [[phab:T314750|T314750]] (duration: 03m 27s)
* 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 23:
* 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48811 and previous config saved to /var/cache/conftool/dbconfig/20230605-235310-ladsgroup.json
* 23:49 zabe@deploy1002: Finished scap: Backport for [[gerrit:927312{{!}}Stop writing to revision_comment_temp in group0 wikis (T299954)]] (duration: 07m 02s)
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P48810 and previous config saved


== 2022-08-07 ==
== 2023-06-03 ==
* 19:58 taavi: taavi@mwmaint1002 ~ $ echo "https://upload.wikimedia.org/wikipedia/commons/1/15/Keep_tidy_ask.svg" {{!}} mwscript purgeList.php --wiki enwiki # [[phab:T314712|T314712]]
* 13:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade
* 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32305 and previous config saved to /var/cache/conftool/dbconfig/20220807-135204-ladsgroup.json
* 13:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade
* 13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
* 13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 13:28 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet
* 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32304 and previous config saved to /var/cache/conftool/dbconfig/20220807-135143-ladsgroup.json
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32303 and previous config saved to /var/cache/conftool/dbconfig/20220807-133637-ladsgroup.json
* 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32302 and previous config saved to /var/cache/conftool/dbconfig/20220807-132131-ladsgroup.json
* 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32301 and previous config saved to /var/cache/conftool/dbconfig/20220807-130625-ladsgroup.json
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32300 and previous config saved to /var/cache/conftool/dbconfig/20220807-120610-ladsgroup.json
* 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32299 and previous config saved to /var/cache/conftool/dbconfig/20220807-120549-ladsgroup.json
* 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32298 and previous config saved to /var/cache/conftool/dbconfig/20220807-115043-ladsgroup.json
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32297 and previous config saved to /var/cache/conftool/dbconfig/20220807-113537-ladsgroup.json
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32296 and previous config saved to /var/cache/conftool/dbconfig/20220807-112031-ladsgroup.json


== 2022-08-06 ==
== 2023-06-02 ==
* 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P32295 and previous config saved to /var/cache/conftool/dbconfig/20220806-175916-ladsgroup.json
* 20:16 apergos: rsync in ariel screen session, bwlimit 100000, running on dumpsdata1003, pulling from dumpsdata1002, copying over 'other dumps'
* 17:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 18:42 bblack: dns*: puppets are all re-enabled, ntp restarts are done, etc
* 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 03:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
* 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:45 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
* 03:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:27 bblack: dns*: disabling puppet to control rollout of NTP config fixups
* 03:02 krinkle@deploy1002: Synchronized w/: {{Gerrit|I9067d47fab0324}} (duration: 03m 25s)
* 16:03 bblack: dns*: removed faulty authdns[12]001 lines from /etc/hosts via cumin+sed
* 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:35 sukhe: restart ntp.service on dns1002
* 03:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:26 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:25 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:25 ottomata: deploying flink-operator change to dse-k8s and wikikube to add ingress for health check port - https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/926479
* 02:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:24 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 13:24 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 13:24 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:24 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:03 moritzm: installing at-spi2-core bugfix updates from Bullseye point release
* 09:35 moritzm: installing texlive-security updates on buster
* 09:18 akosiaris: update kubernetes-node to 1.23.14-2 on all P:kubernetes::node hosts (88 in total) [[phab:T337836|T337836]]. Reload systemd for unit changes to take effect
* 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5016.eqsin.wmnet
* 08:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5016.eqsin.wmnet
* 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5015.eqsin.wmnet
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5015.eqsin.wmnet
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5014.eqsin.wmnet
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5014.eqsin.wmnet
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5013.eqsin.wmnet
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5013.eqsin.wmnet
* 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 0 hosts:
* 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 0 hosts:
* 08:42 moritzm: installing traceroute bugfix updates from Bullseye point release
* 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
* 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
* 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3006.wikimedia.org
* 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3006.wikimedia.org
* 07:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
* 07:22 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
* 01:53 ejegg: fundraising python tools upgraded from {{Gerrit|759d4c89}} to {{Gerrit|2ca83336}}
* 01:22 cstone: civicrm upgraded from {{Gerrit|3819d6d1}} to {{Gerrit|bcc8fccc}}


== 2022-08-05 ==
== 2023-06-01 ==
* 22:20 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly (duration: 02m 01s)
* 21:06 samtar@deploy1002: Finished scap: Backport for [[gerrit:925858{{!}}Remove deleted config wgVectorStickyHeaderEdit (T337955)]] (duration: 08m 30s)
* 22:18 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly
* 20:59 samtar@deploy1002: esanders and samtar: Backport for [[gerrit:925858{{!}}Remove deleted config wgVectorStickyHeaderEdit (T337955)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 17:08 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bullseye
* 20:57 samtar@deploy1002: Started scap: Backport for [[gerrit:925858{{!}}Remove deleted config wgVectorStickyHeaderEdit (T337955)]]
* 16:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS bullseye
* 20:54 samtar@deploy1002: Finished scap: Backport for [[gerrit:925792{{!}}Remove config and AB test code for edit buttons in sticky header (T337955)]] (duration: 10m 29s)
* 16:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 20:45 samtar@deploy1002: samtar and ksarabia: Backport for [[gerrit:925792{{!}}Remove config and AB test code for edit buttons in sticky header (T337955)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:49 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
* 20:44 samtar@deploy1002: Started scap: Backport for [[gerrit:925792{{!}}Remove config and AB test code for edit buttons in sticky header (T337955)]]
* 16:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 20:21 samtar@deploy1002: Finished scap: Backport for [[gerrit:917863{{!}}Deploy Research Incentive survey on enwiki (T336092)]] (duration: 07m 56s)
* 16:37 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
* 20:15 samtar@deploy1002: dani and samtar: Backport for [[gerrit:917863{{!}}Deploy Research Incentive survey on enwiki (T336092)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
* 20:13 samtar@deploy1002: Started scap: Backport for [[gerrit:917863{{!}}Deploy Research Incentive survey on enwiki (T336092)]]
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
* 20:12 samtar@deploy1002: Finished scap: Backport for [[gerrit:886370{{!}}Always collapse by default the CheckUserHelper on loginwiki (T328726)]] (duration: 08m 20s)
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
* 20:05 samtar@deploy1002: samtar and dreamyjazz: Backport for [[gerrit:886370{{!}}Always collapse by default the CheckUserHelper on loginwiki (T328726)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:886370{{!}}Always collapse by default the CheckUserHelper on loginwiki (T328726)]]
* 16:26 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
* 19:51 ejegg: fundraising python tools upgraded from {{Gerrit|72570bdd}} to {{Gerrit|759d4c89}}
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1193.eqiad.wmnet with OS bullseye
* 19:12 mforns@deploy1002: Finished deploy [airflow-dags/analytics@21e7354]: (no justification provided) (duration: 02m 42s)
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db1192.eqiad.wmnet with OS bullseye
* 19:11 mforns@deploy1002: Started deploy [airflow-dags/analytics@21e7354]: (no justification provided)
* 16:12 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@8489923]: [[phab:T304954|T304954]]: Automate imagesuggestion imports (duration: 02m 03s)
* 19:11 bblack@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work (duration: 03m 27s)
* 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
* 19:09 bblack: lvs1* (eqiad): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]]
* 16:11 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :) (duration: 06m 09s)
* 19:08 bblack@deploy1002: Locking from deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work
* 16:10 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@8489923]: [[phab:T304954|T304954]]: Automate imagesuggestion imports
* 18:45 bblack: lvs6* (drmrs): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]]
* 16:07 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
* 18:33 bblack: lvs3* (esams): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]]
* 16:07 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
* 18:32 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]]
* 16:05 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :)
* 17:50 mforns@deploy1002: Finished deploy [airflow-dags/analytics@03ca1c1]: (no justification provided) (duration: 00m 10s)
* 16:04 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine (duration: 34m 38s)
* 17:50 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_drmrs and A:cp
* 16:03 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
* 17:50 mforns@deploy1002: Started deploy [airflow-dags/analytics@03ca1c1]: (no justification provided)
* 15:55 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
* 17:49 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
* 15:52 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1191.eqiad.wmnet with OS bullseye
* 17:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
* 15:51 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
* 17:48 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_drmrs and A:cp
* 15:42 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1190.eqiad.wmnet with OS bullseye
* 17:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
* 15:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
* 17:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
* 15:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
* 17:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 15:30 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine
* 17:45 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 15:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
* 17:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
* 15:25 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
* 17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
* 15:24 jbond: upload trapperkeeper-metrics-clojure to puppet7 component
* 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
* 15:22 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
* 16:55 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: revert: Remove undeeded wgEventBusStreamNamesMap override setting.  Recent EventBus changes are not deployed yet? - [[phab:T336817|T336817]] (duration: 07m 24s)
* 15:19 jbond: upload puppetlabs-http-client-clojur to puppet7 component
* 16:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
* 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:53 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:53 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: Remove undeeded wgEventBusStreamNamesMap override setting - [[phab:T336817|T336817]] (duration: 08m 18s)
* 15:14 dancy@deploy1002: Finished scap: Backport for [[gerrit:820653]] scap gitignore: ignore all files under the `scap` directory (duration: 04m 41s)
* 16:42 bblack: lvs2* (codfw): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]]
* 15:11 jbond: upload jolokia to puppet7 component
* 16:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
* 15:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1185.eqiad.wmnet with OS bullseye
* 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
* 15:09 dancy@deploy1002: Started scap: Backport for [[gerrit:820653]] scap gitignore: ignore all files under the `scap` directory
* 16:35 bblack: lvs5* (eqsin): upgrade pybal to 1.15.13 - [[phab:T334703|T334703]]
* 15:09 jbond: upload test-chuck-clojure to puppet7 component
* 16:32 bblack: lvs400[89]: upgrade pybal to 1.15.13 - [[phab:T334703|T334703]] (round 2!)
* 15:05 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
* 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 15:04 jbond: upload test-check-clojure to puppet7 component
* 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
* 14:57 jbond: upload nippy-clojure to puppet7 component
* 16:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
* 14:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
* 16:10 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
* 14:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
* 16:07 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
* 14:43 jbond: upload fressian to puppet7 component
* 16:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage
* 14:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
* 16:06 mutante: gerrit - set repo wikimedia/annualreport to readonly (from active) - [[phab:T337041|T337041]]
* 14:40 jbond: upload test-generative-clojure to puppet7 component
* 16:04 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage
* 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 14:34 jbond: upload data-generators-clojure to puppet7 component
* 16:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 14:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 14:23 jbond: upload encore-clojure to puppet7 component
* 15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
* 14:17 jbond: upload truss-clojure to puppet7 component
* 15:45 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 14:13 jbond: upload structured-logging-clojure to puppet7 component
* 15:44 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 14:06 jbond: upload murphy-clojure to puppet7 component
* 15:33 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 13:57 jbond: upload logstash-logback-encoder-7.2 to puppet7 component
* 15:33 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 13:49 jbond: upload kitchensink-clojure to puppet7 component
* 15:21 fabfur: running run-puppet-agent on cp6010.drmrs.wmnet to fix icinga check from cookbook
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts with fragile power supply ([[phab:T314559|T314559]] [[phab:T314628|T314628]])', diff saved to https://phabricator.wikimedia.org/P32292 and previous config saved to /var/cache/conftool/dbconfig/20220805-132709-ladsgroup.json
* 15:15 bblack: lvs400[89]: upgrade pybal to 1.15.13 - [[phab:T334703|T334703]]
* 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 15:11 sukhe: reprepro -C component/pybal bullseye-wikimedia pybal_1.15.13_source.changes
* 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 15:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog1002.eqiad.wmnet with OS bullseye
* 13:09 sukhe: repool codfw
* 14:59 moritzm: installing python-sqlparse security updates
* 13:02 jbond: upload honeysql-clojure to puppet7 component
* 14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
* 12:53 _joe_: progressive repool of services in codfw
* 14:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 12:24 moritzm: installing nano bugfix updates from bullseye point release
* 14:55 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 11:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
* 11:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 14:53 moritzm: installing jackson-databind security updates
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on D3 ([[phab:T310146|T310146]])', diff saved to https://phabricator.wikimedia.org/P32291 and previous config saved to /var/cache/conftool/dbconfig/20220805-113729-ladsgroup.json
* 14:49 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C6 ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32290 and previous config saved to /var/cache/conftool/dbconfig/20220805-113555-ladsgroup.json
* 14:45 fabfur: running run-puppet-agent on cp6009.drmrs.wmnet to fix icinga check from cookbook
* 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C5 ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32289 and previous config saved to /var/cache/conftool/dbconfig/20220805-113436-ladsgroup.json
* 14:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage
* 10:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 14:41 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage
* 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 14:40 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_drmrs and A:cp
* 10:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 10:12 Amir1: dbmaint at s4@codfw ([[phab:T312863|T312863]])
* 14:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 14:36 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_drmrs and A:cp
* 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 14:34 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 14:29 moritzm: installing imagemagick security updates on buster
* 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 14:16 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog1002.eqiad.wmnet with OS bullseye
* 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 14:14 fabfur: Disabled puppet on A:cp-drmrs for [[phab:T323557|T323557]]
* 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
* 14:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3c9cc85]: (no justification provided) (duration: 00m 11s)
* 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
* 14:13 mforns@deploy1002: Started deploy [airflow-dags/analytics@3c9cc85]: (no justification provided)
* 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gerrit2002.wikimedia.org
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48700 and previous config saved to /var/cache/conftool/dbconfig/20230601-141317-ladsgroup.json
* 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.remove-downtime for gerrit2002.wikimedia.org
* 14:11 claime: Removing obsolete mediawiki-services-function-evaluator from registry - [[phab:T337505|T337505]]
* 00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
* 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48699 and previous config saved to /var/cache/conftool/dbconfig/20230601-135811-ladsgroup.json
* 00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
* 13:52 moritzm: installing sysstat security updates
* 00:18 mutante: restarting gerrit for config change - removing old replica [[phab:T313250|T313250]]
* 13:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 13:51 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 13:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 13:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 13:49 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 13:49 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48698 and previous config saved to /var/cache/conftool/dbconfig/20230601-134304-ladsgroup.json
* 13:29 moritzm: installing openssl security updates on bullseye
* 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48697 and previous config saved to /var/cache/conftool/dbconfig/20230601-132758-ladsgroup.json
* 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48695 and previous config saved to /var/cache/conftool/dbconfig/20230601-132319-ladsgroup.json
* 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
* 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T336886|T336886]])', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230601-132238-ladsgroup.json
* 13:21 claime: Removing obsolete mediawiki-services-function-orchestrator from registry - [[phab:T337505|T337505]]
* 13:13 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:925766{{!}}beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362)]], [[gerrit:923305{{!}}Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)]] (duration: 11m 08s)
* 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48694 and previous config saved to /var/cache/conftool/dbconfig/20230601-130732-ladsgroup.json
* 13:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
* 13:04 urbanecm@deploy1002: urbanecm and daimona: Backport for [[gerrit:925766{{!}}beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362)]], [[gerrit:923305{{!}}Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 13:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
* 13:02 urbanecm@deploy1002: Started scap: Backport for [[gerrit:925766{{!}}beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362)]], [[gerrit:923305{{!}}Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)]]
* 12:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 12:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 12:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 12:52 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48693 and previous config saved to /var/cache/conftool/dbconfig/20230601-125226-ladsgroup.json
* 12:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 12:49 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48692 and previous config saved to /var/cache/conftool/dbconfig/20230601-123720-ladsgroup.json
* 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48691 and previous config saved to /var/cache/conftool/dbconfig/20230601-123236-ladsgroup.json
* 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
* 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
* 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48690 and previous config saved to /var/cache/conftool/dbconfig/20230601-122900-ladsgroup.json
* 12:17 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:17 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:16 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 12:16 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48689 and previous config saved to /var/cache/conftool/dbconfig/20230601-121354-ladsgroup.json
* 12:03 Daimona: Creating ce_tracking_tools table for the CampaignEvents extension on testwiki, test2wiki, officewiki, and metawiki # [[phab:T336365|T336365]]
* 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48688 and previous config saved to /var/cache/conftool/dbconfig/20230601-115848-ladsgroup.json
* 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48687 and previous config saved to /var/cache/conftool/dbconfig/20230601-114342-ladsgroup.json
* 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48686 and previous config saved to /var/cache/conftool/dbconfig/20230601-113843-ladsgroup.json
* 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
* 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48685 and previous config saved to /var/cache/conftool/dbconfig/20230601-113822-ladsgroup.json
* 11:28 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 11:28 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 11:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 11:25 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48684 and previous config saved to /var/cache/conftool/dbconfig/20230601-112316-ladsgroup.json
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48683 and previous config saved to /var/cache/conftool/dbconfig/20230601-110810-ladsgroup.json
* 11:04 jayme: disabling puppet on all kubernestes control planes for https://gerrit.wikimedia.org/r/c/operations/puppet/+/925707
* 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48682 and previous config saved to /var/cache/conftool/dbconfig/20230601-105303-ladsgroup.json
* 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48681 and previous config saved to /var/cache/conftool/dbconfig/20230601-104803-ladsgroup.json
* 10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
* 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48680 and previous config saved to /var/cache/conftool/dbconfig/20230601-104742-ladsgroup.json
* 10:45 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48679 and previous config saved to /var/cache/conftool/dbconfig/20230601-103236-ladsgroup.json
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48678 and previous config saved to /var/cache/conftool/dbconfig/20230601-101730-ladsgroup.json
* 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 10:16 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 10:14 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48677 and previous config saved to /var/cache/conftool/dbconfig/20230601-100224-ladsgroup.json
* 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48676 and previous config saved to /var/cache/conftool/dbconfig/20230601-100011-ladsgroup.json
* 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
* 09:56 moritzm: installing systemd security updates on bullseye
* 09:53 Amir1: ladsgroup@mwmaint1002:~$ foreachwikiindblist group2 extensions/AbuseFilter/maintenance/MigrateActorsAF.php ([[phab:T336224|T336224]])
* 09:52 gehel: cleaning apt archives on an-test-worker1002: `sudo apt-get clean`, recovering 14G
* 09:49 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 09:43 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2004-dev']
* 09:36 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev']
* 09:36 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol2004-dev']
* 09:35 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev']
* 09:32 volans: installed spicerack v7.2.0 on cumin2002
* 09:30 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 09:21 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
* 09:18 godog: remove lv prometheus-global - [[phab:T288196|T288196]]
* 09:17 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
* 09:17 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
* 09:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:16 volans@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
* 09:13 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
* 09:12 volans: installed spicerack v7.2.0 on cumin1001
* 09:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
* 09:07 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
* 09:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
* 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet
* 08:57 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet
* 08:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
* 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001"
* 08:53 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001"
* 08:49 aborrero@cumin1001: START - Cookbook sre.dns.netbox
* 08:48 apergos: UTC morning backport and config training window done
* 08:30 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 08:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 08:28 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 08:28 daniel@deploy1002: Finished scap: Backport for [[gerrit:922512{{!}}ORES: add model versions configuration and thresholds (T319170)]] (duration: 10m 12s)
* 08:28 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 08:19 daniel@deploy1002: daniel and isaranto: Backport for [[gerrit:922512{{!}}ORES: add model versions configuration and thresholds (T319170)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:18 daniel@deploy1002: Started scap: Backport for [[gerrit:922512{{!}}ORES: add model versions configuration and thresholds (T319170)]]
* 07:55 daniel@deploy1002: Finished scap: Backport for [[gerrit:923588{{!}}Enable parser cache warming jobs for parsoid on frwiki (T329366)]] (duration: 09m 09s)
* 07:48 daniel@deploy1002: daniel: Backport for [[gerrit:923588{{!}}Enable parser cache warming jobs for parsoid on frwiki (T329366)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:46 daniel@deploy1002: Started scap: Backport for [[gerrit:923588{{!}}Enable parser cache warming jobs for parsoid on frwiki (T329366)]]
* 07:42 mlitn@deploy1002: Finished scap: Backport for [[gerrit:917871{{!}}Add $wgInterwikiLogoOverride (T315269)]] (duration: 33m 02s)
* 07:35 moritzm: installing libssh security updates
* 07:29 mlitn@deploy1002: mlitn: Backport for [[gerrit:917871{{!}}Add $wgInterwikiLogoOverride (T315269)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
* 07:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
* 07:09 mlitn@deploy1002: Started scap: Backport for [[gerrit:917871{{!}}Add $wgInterwikiLogoOverride (T315269)]]
* 06:16 kart_: Updated MinT to 2023-06-01-041041-production ([[phab:T336525|T336525]])
* 06:01 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: applied
* 05:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
* 05:49 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
* 05:46 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
* 05:44 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
* 05:42 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
* 05:39 kart_: Updated cxserver to 2023-06-01-041016-production ([[phab:T337669|T337669]])
* 05:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 05:34 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 05:32 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 05:32 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 00:11 eileen: civicrm upgraded from {{Gerrit|885208ca}} to {{Gerrit|3819d6d1}}


== 2022-08-04 ==
* 23:07 mutante: switching gerrit-replica.wikimedia.org to new machine gerrit2002, dropping gerrit-replica-new.wikimedia.org [[phab:T313250|T313250]]
* 21:07 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:56 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:819774]] tkwiki: Update wordmark (duration: 06m 12s)
* 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 20:51 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 20:50 thcipriani@deploy1002: Started scap: Backport for [[gerrit:819774]] tkwiki: Update wordmark
* 20:48 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:812391]] [config]: Add click event logging for mobile and desktop (duration: 39m 16s)
* 20:45 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:24 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 20:23 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:22 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:13 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 20:10 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 20:09 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 20:08 thcipriani@deploy1002: Started scap: Backport for [[gerrit:812391]] [config]: Add click event logging for mobile and desktop
* 19:59 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 19:55 dancy@deploy1002: rebuilt and synchronized wikiversions files: resync
* 19:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-be2001.codfw.wmnet
* 19:49 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for thanos-be2001.codfw.wmnet
* 19:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 8 hosts
* 19:44 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 8 hosts
* 19:42 Emperor: rebooting thanos-be2001 to fix drive ordering
* 19:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2071.codfw.wmnet
* 19:37 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2071.codfw.wmnet
* 19:31 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2071.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 19:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2071.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:12 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
* 19:11 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
* 19:11 dancy: There were many errors during php-fpm restart due to failure to contact  http://lvs2009:9090/pools/appservers-https_443/mw2361.codfw.wmnet and the like.
* 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 19:09 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
* 19:09 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
* 19:05 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
* 19:04 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
* 19:04 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
* 19:03 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
* 19:03 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
* 19:02 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
* 19:02 ottomata: roll-restarting eventgate-analytics-external to pick up backwards incompatible schema change - [[phab:T314151|T314151]]
* 18:47 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 18:46 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 18:41 cwhite: poweroff kafka-logging2003 - [[phab:T310145|T310145]]
* 18:39 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw237[0-6].codfw.wmnet
* 18:39 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts
* 18:39 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for 7 hosts
* 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2369.codfw.wmnet
* 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2369.codfw.wmnet
* 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2368.codfw.wmnet
* 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2368.codfw.wmnet
* 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2367.codfw.wmnet
* 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2367.codfw.wmnet
* 18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2369.codfw.wmnet
* 18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2368.codfw.wmnet
* 18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2367.codfw.wmnet
* 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2366.codfw.wmnet
* 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2366.codfw.wmnet
* 18:34 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2366.codfw.wmnet
* 18:30 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2279.codfw.wmnet
* 18:30 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2278.codfw.wmnet
* 18:29 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2277.codfw.wmnet
* 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2276.codfw.wmnet
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2276.codfw.wmnet
* 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2275.codfw.wmnet
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2275.codfw.wmnet
* 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2274.codfw.wmnet
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2274.codfw.wmnet
* 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2273.codfw.wmnet
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2273.codfw.wmnet
* 18:26 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 02m 39s)
* 18:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2272.codfw.wmnet
* 18:24 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2272.codfw.wmnet
* 18:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2271.codfw.wmnet
* 18:24 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2271.codfw.wmnet
* 18:23 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 18:23 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 32s)
* 18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2276.codfw.wmnet
* 18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2275.codfw.wmnet
* 18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2274.codfw.wmnet
* 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2273.codfw.wmnet
* 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2272.codfw.wmnet
* 18:22 Emperor: shutdown  moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,68].codfw.wmnet PDU work [[phab:T310145|T310145]]
* 18:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work
* 18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 18:21 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work
* 18:21 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s)
* 18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 18:21 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s)
* 18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 18:20 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 49s)
* 18:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
* 18:20 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
* 18:19 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 18:14 mutante: mw2272 and upwards: scap pull, checking monitoring, repooling.. one by one
* 18:13 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2271.codfw.wmnet
* 18:12 btullis@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 51s)
* 18:11 btullis@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 18:06 btullis@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 54s)
* 18:04 btullis@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade
* 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade
* 17:43 mutante: maps2008 - downtime and shutdown for D3 maintenance
* 17:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots
* 17:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots
* 17:42 mutante: thunmbor2006 - downtime and shutdown for D3 maintenance
* 17:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots
* 17:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots
* 17:39 mutante: mw2386 - systemctl reset-failed
* 17:31 mutante: phab2001 - systemctl restart ssh-phab, attempting to clear Icinga pybal alerts, related to reboots
* 17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
* 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
* 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
* 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
* 17:28 Amir1: dbmaint at s4@eqiad ([[phab:T312863|T312863]])
* 17:26 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:26 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:24 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:23 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:23 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:23 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:20 mutante: [an-launcher1002:~] $ sudo systemctl reset-failed
* 17:20 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=ms-fe2012.codfw.wmnet
* 17:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=varnish-fe
* 17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=ats-be
* 17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=ats-tls
* 17:16 Emperor: shutdown of moss-fe2002.codfw.wmnet,ms-be20[37,38,43,61,65,69].codfw.wmnet,ms-fe2012.codfw.wmnet,thanos-fe2003.codfw.wmnet for power work [[phab:T310146|T310146]]
* 17:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[2035-2036].codfw.wmnet with reason: shutdown for PDU upgrade
* 17:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[2035-2036].codfw.wmnet with reason: shutdown for PDU upgrade
* 17:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: PDU work
* 17:15 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: PDU work
* 17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
* 17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
* 17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
* 17:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet
* 17:13 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet
* 17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=varnish-fe
* 17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=ats-be
* 17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=ats-tls
* 17:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2050.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 17:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2050.codfw.wmnet with reason: [[phab:T310146|T310146]]
* 17:11 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288] (duration: 00m 04s)
* 17:11 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288]
* 17:11 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288] (duration: 00m 07s)
* 17:10 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288]
* 17:10 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 15s)
* 17:09 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 17:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: shutdown for PDU upgrade
* 17:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: shutdown for PDU upgrade
* 16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
* 16:51 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288] (duration: 07m 14s)
* 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2016.codfw.wmnet
* 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase202[05].codfw.wmnet
* 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase202[05].codfw.wmnet
* 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 16:43 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288]
* 16:43 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288] (duration: 00m 07s)
* 16:43 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288]
* 16:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 18 hosts
* 16:37 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 18 hosts
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2059.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 16:35 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2059.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 16:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2003.codfw.wmnet with reason: PDU swap
* 16:34 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 20s)
* 16:34 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2003.codfw.wmnet with reason: PDU swap
* 16:34 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 16:32 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 29m 59s)
* 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D3 for PDU maint', diff saved to https://phabricator.wikimedia.org/P32286 and previous config saved to /var/cache/conftool/dbconfig/20220804-163037-ladsgroup.json
* 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820376{{!}}Start reading from new templatelinks columns in commons (T306673)]] (duration: 03m 00s)
* 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:17 brett: deploying authdns - geodns: Map out African countries by DC latency ([[phab:T311472|T311472]])
* 16:12 cwhite: poweroff logstash2028 - [[phab:T310145|T310145]]
* 16:06 Emperor: shutdown ms-be20[39,49,54].codfw.wmnet,thanos-be2003 for PDU swap [[phab:T310145|T310145]]
* 16:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet with reason: PDU work
* 16:02 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet with reason: PDU work
* 16:02 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
* 15:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2048.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2048.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 15:43 damilare: payments-wiki upgraded from {{Gerrit|0e4a5b3b}} to {{Gerrit|6880236d}}
* 15:37 _joe_: uncordoning ml-serve200<nowiki>{</nowiki>1,6<nowiki>}</nowiki>
* 15:27 sukhe: power off cp2037,cp2038: PDU upgrade
* 15:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:30:00 on phab2001.codfw.wmnet with reason: PDU swap
* 15:25 jelto: power off phab2001
* 15:25 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:30:00 on phab2001.codfw.wmnet with reason: PDU swap
* 15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[2037-2038].codfw.wmnet with reason: shutdown for PDU upgrade
* 15:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[2037-2038].codfw.wmnet with reason: shutdown for PDU upgrade
* 15:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=varnish-fe
* 15:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=ats-be
* 15:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=ats-tls
* 15:21 XioNoX: un-drain codfw-ulsfo link - [[phab:T310310|T310310]]
* 15:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2116,2127,2167-2168].codfw.wmnet,es2022.codfw.wmnet with reason: Maintenance ([[phab:T310145|T310145]])
* 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db[2116,2127,2167-2168].codfw.wmnet,es2022.codfw.wmnet with reason: Maintenance ([[phab:T310145|T310145]])
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool C6 for PDU maint ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32285 and previous config saved to /var/cache/conftool/dbconfig/20220804-151958-ladsgroup.json
* 15:16 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 15:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on restbase[2016,2020,2025].codfw.wmnet with reason: PDU maintenance
* 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on restbase[2016,2020,2025].codfw.wmnet with reason: PDU maintenance
* 15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2114,2126,2166].codfw.wmnet with reason: Maintenance ([[phab:T310145|T310145]])
* 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db[2114,2126,2166].codfw.wmnet with reason: Maintenance ([[phab:T310145|T310145]])
* 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=varnish-fe
* 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=ats-be
* 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=ats-tls
* 15:12 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2058,2064].codfw.wmnet
* 15:12 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2058,2064].codfw.wmnet
* 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts for PDU maint ([[phab:T310145|T310145]])', diff saved to https://phabricator.wikimedia.org/P32284 and previous config saved to /var/cache/conftool/dbconfig/20220804-151121-ladsgroup.json
* 15:09 godog: poweroff logstash2002 - [[phab:T310145|T310145]]
* 15:07 _joe_: pwoering down mc203<nowiki>{</nowiki>0,1<nowiki>}</nowiki>
* 15:07 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on logstash2002.codfw.wmnet with reason: pdu
* 15:06 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on logstash2002.codfw.wmnet with reason: pdu
* 15:05 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 14:58 jelto: power off mc20[30-31]
* 14:56 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc[2030-2031].codfw.wmnet with reason: PDU swap
* 14:56 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc[2030-2031].codfw.wmnet with reason: PDU swap
* 14:56 XioNoX: draining codfw-ulsfo link - [[phab:T310310|T310310]]
* 14:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2025.codfw.wmnet
* 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2020.codfw.wmnet
* 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2016.codfw.wmnet
* 14:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2011.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 14:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2011.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 14:25 jelto: power off gitlab-runner2003
* 14:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on gitlab-runner2003.codfw.wmnet with reason: PDU swap
* 14:25 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2001.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 14:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2001.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 14:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on gitlab-runner2003.codfw.wmnet with reason: PDU swap
* 14:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2032.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 14:22 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2032.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 14:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on logstash2035.codfw.wmnet with reason: pdu
* 14:22 godog: poweroff logstash2035 - [[phab:T310145|T310145]]
* 14:22 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on logstash2035.codfw.wmnet with reason: pdu
* 14:21 Emperor: shutdown ms-be20[58,64].codfw.wmnet for PDU swap [[phab:T310145|T310145]]
* 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:14 Lucas_WMDE: UTC afternoon backport+config window done
* 14:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:820454{{!}}Remove unused $wgMathUseRestBase (T274436)]] (duration: 03m 01s)
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:820254{{!}}CommonSettings-labs: Fix usage of $wgSFSValidateIPListLocationMD5]] (duration: 02m 51s)
* 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2033.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 14:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2033.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:820255{{!}}wikitech: Remove old LDAP config vars]] (duration: 02m 54s)
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2058,2064].codfw.wmnet with reason: PDU work
* 13:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2058,2064].codfw.wmnet with reason: PDU work
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:820404{{!}}Remove unused $wgIncludejQueryMigrate (T280944)]] (2/2) (duration: 03m 03s)
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:45 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820404{{!}}Remove unused $wgIncludejQueryMigrate (T280944)]] (1/2) (duration: 02m 58s)
* 13:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:40 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2066.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 13:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2066.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:820402{{!}}Remove unused $wgLegacyJavaScriptGlobals (T72470)]] (2/2) (duration: 02m 59s)
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:820402{{!}}Remove unused $wgLegacyJavaScriptGlobals (T72470)]] (1/2) (duration: 02m 58s)
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForSDC.php: Config: [[gerrit:820397{{!}}Remove unused $wgWBCSEnableDispatchingQueryBuilder]] (duration: 03m 01s)
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:17 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820441{{!}}Remove unused CA P3P config]] (duration: 03m 09s)
* 13:14 jbond: intorudce new puppetmaster backends puppetmaster[12]004
* 13:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 13:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: [[phab:T310145|T310145]]
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:819175{{!}}QuickSurveys: Deploy research incentive survey to Bengali wiki (T314333)]] (duration: 03m 26s)
* 13:07 moritzm: installing jetty9 security updates
* 12:48 moritzm: installing Linux 4.19.249 kernels on Buster hosts
* 12:03 jbond: send sretest100[12] and idp-test2001 to the new puppetmaster[12]004 servers to test
* 11:46 moritzm: installing Linux 5.10.127-2 kernels on Bullseye hosts
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2017.codfw.wmnet to cluster codfw and group D
* 11:41 moritzm: installing libpgjava security updates
* 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2017.codfw.wmnet to cluster codfw and group D
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
* 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2017.codfw.wmnet with OS bullseye
* 10:53 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2015.codfw.wmnet
* 10:53 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2022.codfw.wmnet
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2017.codfw.wmnet with reason: host reimage
* 10:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2017.codfw.wmnet with reason: host reimage
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2017.codfw.wmnet with OS bullseye
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 10:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 9:00:00 on 32 hosts with reason: PDU swap
* 10:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 9:00:00 on 32 hosts with reason: PDU swap
* 10:03 Lucas_WMDE: stashbot temporarily parted and lost several logs between 9:42 UTC and 9:49 UTC; mainly mwdebug helmfil start/done, also ayounsi sre.deploy.python-code cookbook to cumin1001, cumin2002; see IRC logs
* 10:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update requirements + wmf-netbox - ayounsi@cumin1001
* 10:00 jynus: stop db2099 [[phab:T310145|T310145]]
* 10:00 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update requirements + wmf-netbox - ayounsi@cumin1001
* 09:39 jelto: power off mw22[71-79].codfw.wmnet
* 09:38 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments/includes/EventLogging/SpecialEditGrowthConfigLogger.php: {{Gerrit|ba67dd940217e9f786f4349b4da0fe088475fde9}}: SpecialEditGrowthConfigLogger: Update schema version ([[phab:T314173|T314173]], [[phab:T312148|T312148]]) (duration: 03m 18s)
* 09:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2177 to s3 [[phab:T311494|T311494]]', diff saved to https://phabricator.wikimedia.org/P32282 and previous config saved to /var/cache/conftool/dbconfig/20220804-093704-marostegui.json
* 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddcd333015bb58a98709a5005a5db7e8519dd0a5}}: testwiki: Growth: Assign enrollasmentor to * ([[phab:T310905|T310905]]) (duration: 03m 41s)
* 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:32 jelto: set/pooled=inactive mw22[71-79].codfw.wmnet
* 09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 9:30:00 on 9 hosts with reason: PDU swap
* 09:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 9:30:00 on 9 hosts with reason: PDU swap
* 09:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: wmf-netbox.py update - ayounsi@cumin1001
* 09:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2089.codfw.wmnet
* 09:26 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0614a39bf15252c95a96565dd7c986237f3d3323}}: testwiki: Growth: Switch to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 38s)
* 09:25 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: wmf-netbox.py update - ayounsi@cumin1001
* 09:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:23 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 09:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2089.codfw.wmnet
* 09:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=kubernetes2022.codfw.wmnet
* 09:03 oblivian@mwmaint1002: pull aborted:  (duration: 00m 06s)
* 08:58 moritzm: installing gsasl security updates
* 08:57 oblivian@mwmaint1002: pull aborted:  (duration: 00m 18s)
* 08:48 moritzm: draining ganeti2017 [[phab:T311686|T311686]]
* 08:45 jelto: power off kubernetes2022
* 08:43 oblivian@deploy1002: Synchronized README: testing new scap configuration (duration: 03m 18s)
* 08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:22:00 on kubernetes2022.codfw.wmnet with reason: PDU swap
* 08:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:22:00 on kubernetes2022.codfw.wmnet with reason: PDU swap
* 08:37 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2022.codfw.wmnet
* 08:35 jelto: kubectl drain kubernetes2022.codfw.wmnet
* 08:32 jelto: kubectl cordon kubernetes2022.codfw.wmnet
* 08:28 moritzm: imported gsasl 1.8.0-8+wmf1 to stretch-wikimedia
* 08:26 jelto: power off mc2049 and mc2050
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:36:00 on mc[2049-2050].codfw.wmnet with reason: PDU swap
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:36:00 on mc[2049-2050].codfw.wmnet with reason: PDU swap
* 08:22 oblivian@mwmaint1002: pull aborted:  (duration: 00m 11s)
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132, db111, db1127, db1143', diff saved to https://phabricator.wikimedia.org/P32281 and previous config saved to /var/cache/conftool/dbconfig/20220804-081958-root.json
* 08:19 jelto: power off mc2047 and mc2048
* 08:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:45:00 on mc[2047-2048].codfw.wmnet with reason: PDU swap
* 08:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:45:00 on mc[2047-2048].codfw.wmnet with reason: PDU swap
* 08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 08:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 07:55 marostegui: Remove grants for 208.80.154.160/208.80.155.109 [[phab:T314528|T314528]]
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2089 from dbctl [[phab:T313799|T313799]]', diff saved to https://phabricator.wikimedia.org/P32280 and previous config saved to /var/cache/conftool/dbconfig/20220804-074957-marostegui.json
* 07:47 godog: grow sda/sdb 3 by 100G on thanos-be2002 - [[phab:T314275|T314275]]
* 07:46 godog: grow sda/sdb 3 by 100G on thanos-be1003 - [[phab:T314275|T314275]]
* 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2030.codfw.wmnet to cluster codfw and group A
* 07:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to cluster codfw and group A
* 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2135,2160].codfw.wmnet with reason: codfw pdu maintenance
* 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2135,2160].codfw.wmnet with reason: codfw pdu maintenance
* 07:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
* 07:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2030.codfw.wmnet to cluster codfw and group A
* 07:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to cluster codfw and group A
* 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2023-2025].codfw.wmnet with reason: codfw pdu maintenance
* 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2023-2025].codfw.wmnet with reason: codfw pdu maintenance
* 07:05 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
* 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:58 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 06:58 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 06:58 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 06:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
* 06:06 _joe_: restarted memcached on mc2038 to pick up the actual production configuration
* 05:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2030.codfw.wmnet with OS bullseye
* 05:49 kart_: Updated cxserver to 2022-08-04-022612-production ([[phab:T313296|T313296]], [[phab:T308248|T308248]])
* 05:44 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 05:43 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 05:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2030.codfw.wmnet with reason: host reimage
* 05:39 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:38 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2030.codfw.wmnet with reason: host reimage
* 05:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2030.codfw.wmnet with OS bullseye
* 05:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2030.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 05:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2030.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 04:38 ejegg: payments-wiki upgraded from {{Gerrit|712df4ce}} to {{Gerrit|0e4a5b3b}}
* 04:29 TimStarling: on mw2377 fiddling with CPU frequency control and doing benchmarks
* 04:09 krinkle@mwmaint1002: pull aborted:  (duration: 00m 05s)
* 01:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32278 and previous config saved to /var/cache/conftool/dbconfig/20220804-012341-marostegui.json
* 01:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32277 and previous config saved to /var/cache/conftool/dbconfig/20220804-010834-marostegui.json
* 00:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32276 and previous config saved to /var/cache/conftool/dbconfig/20220804-005328-marostegui.json
* 00:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32275 and previous config saved to /var/cache/conftool/dbconfig/20220804-003822-marostegui.json
* 00:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32274 and previous config saved to /var/cache/conftool/dbconfig/20220804-003611-marostegui.json
* 00:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 00:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32273 and previous config saved to /var/cache/conftool/dbconfig/20220804-003549-marostegui.json
* 00:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32272 and previous config saved to /var/cache/conftool/dbconfig/20220804-002043-marostegui.json
* 00:06 mutante: gerrit - [2022-08-04 00:05:33,173] Replication to gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/analytics/geowiki.git started.. [[phab:T313250|T313250]]
* 00:06 mutante: gerrit - [2022-08-04 00:05:33,173] Replication to gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/analytics/geowiki.git started... [CONTEXT pushOneId="83ad5008" ]
* 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32271 and previous config saved to /var/cache/conftool/dbconfig/20220804-000536-marostegui.json
* 00:03 mutante: gerrit - service restart to deploy config change to add second replica [[phab:T313250|T313250]]
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit.wikimedia.org with reason: service restart
* 00:00 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit.wikimedia.org with reason: service restart
* 00:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart
== 2022-08-03 ==
* 23:59 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart
* 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32270 and previous config saved to /var/cache/conftool/dbconfig/20220803-235030-marostegui.json
* 22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32269 and previous config saved to /var/cache/conftool/dbconfig/20220803-225015-marostegui.json
* 22:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
* 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32268 and previous config saved to /var/cache/conftool/dbconfig/20220803-224827-marostegui.json
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32267 and previous config saved to /var/cache/conftool/dbconfig/20220803-223321-marostegui.json
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32266 and previous config saved to /var/cache/conftool/dbconfig/20220803-221815-marostegui.json
* 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32265 and previous config saved to /var/cache/conftool/dbconfig/20220803-220309-marostegui.json
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32264 and previous config saved to /var/cache/conftool/dbconfig/20220803-220057-marostegui.json
* 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32263 and previous config saved to /var/cache/conftool/dbconfig/20220803-220007-marostegui.json
* 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32262 and previous config saved to /var/cache/conftool/dbconfig/20220803-214501-marostegui.json
* 21:44 damilare: payments-wiki updated from {{Gerrit|e1b6036a}} to {{Gerrit|712df4ce}}
* 21:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - [[phab:T314078|T314078]]
* 21:35 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
* 21:35 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
* 21:30 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 21:30 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32261 and previous config saved to /var/cache/conftool/dbconfig/20220803-212955-marostegui.json
* 21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32260 and previous config saved to /var/cache/conftool/dbconfig/20220803-211449-marostegui.json
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32259 and previous config saved to /var/cache/conftool/dbconfig/20220803-211237-marostegui.json
* 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32258 and previous config saved to /var/cache/conftool/dbconfig/20220803-211216-marostegui.json
* 21:03 ejegg: updated standalone SmashPig deployment from {{Gerrit|8e8f0017}} to {{Gerrit|9b97ea15}}
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32257 and previous config saved to /var/cache/conftool/dbconfig/20220803-205710-marostegui.json
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:55 ebernhardson@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: [[gerrit:820223{{!}}cirrus: Set ElasticaWrite partition count for cloudelastic to 3]] (duration: 03m 29s)
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/includes/VisualEditorParsoidClient.php: {{Gerrit|a804fe18f1e14795ba7836d3ebf6c361bb1538a7}}: Update call to PageConfigFactory::create to use new signature ([[phab:T314523|T314523]]) (duration: 03m 25s)
* 20:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32256 and previous config saved to /var/cache/conftool/dbconfig/20220803-204204-marostegui.json
* 20:39 urbanecm@deploy1002: sync-file aborted: {{Gerrit|a804fe18f1e14795ba7836d3ebf6c361bb1538a7}}: Update call to PageConfigFactory::create to use new signature ([[phab:T314523|T314523]]ú (duration: 00m 00s)
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/: {{Gerrit|b840eef86837aed3e566885110e93b2ca9ab5f42}}: Fix ReplyLinksController#teardown (duration: 03m 27s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:31 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/: {{Gerrit|70a18f5846111a0dfe8ba473daf384cbb8e88804}}:  Add explicit partitioning key to ElasticaWrite ([[phab:T314426|T314426]]) (duration: 03m 13s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/CirrusSearch/: {{Gerrit|9961e9bc8f5873f8ddc8a11108de0a7bfcb14ae6}}: Add explicit partitioning key to ElasticaWrite ([[phab:T314426|T314426]]) (duration: 03m 23s)
* 20:28 cwhite@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host logstash2032.codfw.wmnet
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32255 and previous config saved to /var/cache/conftool/dbconfig/20220803-202658-marostegui.json
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32254 and previous config saved to /var/cache/conftool/dbconfig/20220803-202146-marostegui.json
* 20:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 20:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32253 and previous config saved to /var/cache/conftool/dbconfig/20220803-202125-marostegui.json
* 20:14 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 20:13 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|195f8090b9694be65c937cea108ff4f6400972ec}}: Start writing to cuc_actor on test wikis ([[phab:T233004|T233004]]) (duration: 03m 27s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2032.codfw.wmnet on all recursors
* 20:08 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2032.codfw.wmnet on all recursors
* 20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:07 mutante: gerrit - adding second replica [[phab:T313250|T313250]]
* 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32252 and previous config saved to /var/cache/conftool/dbconfig/20220803-200619-marostegui.json
* 20:04 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 20:03 cwhite@cumin2002: START - Cookbook sre.ganeti.makevm for new host logstash2032.codfw.wmnet
* 20:00 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2012.codfw.wmnet
* 20:00 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2012.codfw.wmnet
* 20:00 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2012.codfw.wmnet
* 19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32251 and previous config saved to /var/cache/conftool/dbconfig/20220803-195113-marostegui.json
* 19:40 ryankemper: [[phab:T314078|T314078]] Forgot to mention, restart is at `ryankemper@cumin1001` tmux session `codfw_restarts`
* 19:39 ryankemper: [[phab:T314078|T314078]] Rolling upgrade of codfw hosts; after this all of eqiad/codfw will have the new plugin version and we can resume the `search-loader` instances: `sudo -E cookbook sre.elasticsearch.rolling-operation search_codfw "codfw cluster plugin upgrade" --upgrade --nodes-per-run 3 --start-datetime 2022-08-03T19:38:10 --task-id [[phab:T314078|T314078]]`
* 19:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - [[phab:T314078|T314078]]
* 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32250 and previous config saved to /var/cache/conftool/dbconfig/20220803-193607-marostegui.json
* 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32249 and previous config saved to /var/cache/conftool/dbconfig/20220803-193354-marostegui.json
* 19:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 19:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32248 and previous config saved to /var/cache/conftool/dbconfig/20220803-193334-marostegui.json
* 19:25 mutante: gerrit1001 - rsyncing /var/lib/gerrit/review_site/ over to gerrit2002 815401
* 19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32247 and previous config saved to /var/cache/conftool/dbconfig/20220803-191828-marostegui.json
* 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32246 and previous config saved to /var/cache/conftool/dbconfig/20220803-190321-marostegui.json
* 18:56 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2011.codfw.wmnet
* 18:56 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2011.codfw.wmnet
* 18:56 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2011.codfw.wmnet
* 18:33 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2027,2037].codfw.wmnet
* 18:33 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2027,2037].codfw.wmnet
* 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:16 dancy@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]] (duration: 03m 37s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:12 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 17:58 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestage2002.codfw.wmnet
* 17:58 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubestage2002.codfw.wmnet
* 17:57 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2025-2026].codfw.wmnet
* 17:57 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2025-2026].codfw.wmnet
* 17:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2044.codfw.wmnet
* 17:57 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2044.codfw.wmnet
* 17:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2043.codfw.wmnet
* 17:56 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2043.codfw.wmnet
* 17:55 ottomata: increasing partitions from 5 to 6 for *.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite topics in Kafka main-eqiad and main-codfw - [[phab:T314426|T314426]]
* 17:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2055.codfw.wmnet
* 17:55 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2055.codfw.wmnet
* 17:50 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=kubestage2002.codfw.wmnet
* 17:38 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2008-2010].codfw.wmnet
* 17:38 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2008-2010].codfw.wmnet
* 17:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase20[12]4.codfw.wmnet
* 17:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
* 17:14 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
* 17:08 ryankemper: [[phab:T310145|T310145]] `elastic2031` and `wcqs2002` powered off in preparation for C1 maintenance
* 17:06 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=(kubernetes2020.codfw.wmnet{{!}}kubernetes2009.codfw.wmnet{{!}}kubernetes2010.codfw.wmnet)
* 17:00 btullis@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 16:48 Emperor: shutdown  moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,55,68].codfw.wmnet PDU work [[phab:T310145|T310145]]
* 16:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
* 16:47 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work
* 16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
* 16:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
* 16:46 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
* 16:40 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2046.codfw.wmnet
* 16:40 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2046.codfw.wmnet
* 16:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 10 hosts
* 16:39 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 10 hosts
* 16:38 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2023.codfw.wmnet
* 16:38 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2023.codfw.wmnet
* 16:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
* 16:37 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
* 16:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
* 16:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
* 16:32 jelto: power off mc2025-2026
* 16:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for rdb2008.codfw.wmnet
* 16:30 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for rdb2008.codfw.wmnet
* 16:28 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 16:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2009-2010,2020].codfw.wmnet
* 16:27 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes[2009-2010,2020].codfw.wmnet
* 16:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 12 hosts
* 16:11 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for 12 hosts
* 16:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
* 16:08 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 15 hosts
* 16:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs[2005-2008].codfw.wmnet
* 16:08 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs[2005-2008].codfw.wmnet
* 15:59 Emperor: shutdown ms-be20[33,47],thanos-be2002 prior to PDU work [[phab:T310070|T310070]]
* 15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
* 15:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
* 15:52 jelto: pooling mw2259-2270 again
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32242 and previous config saved to /var/cache/conftool/dbconfig/20220803-154515-marostegui.json
* 15:38 vgutierrez: clearing ats-be cache on cp6008 - [[phab:T309651|T309651]]
* 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:36 elukey: powercycle kafka-logging2003 - not responsive to serial console
* 15:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: {{Gerrit|4438957e78e0012aff646e52dc16a4fb796cfd6b}}: ServiceImageRecommendationProvider: Add extra logging when no JSON response received ([[phab:T313973|T313973]]) (duration: 03m 04s)
* 15:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
* 15:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
* 15:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
* 15:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
* 15:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
* 15:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2024.codfw.wmnet
* 15:30 vgutierrez: clearing ats-be cache on cp6016 - [[phab:T309651|T309651]]
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32241 and previous config saved to /var/cache/conftool/dbconfig/20220803-153009-marostegui.json
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.eqsin.wmnet on all recursors
* 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.eqsin.wmnet on all recursors
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.ulsfo.wmnet on all recursors
* 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.ulsfo.wmnet on all recursors
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.codfw.wmnet on all recursors
* 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.codfw.wmnet on all recursors
* 15:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2021.codfw.wmnet
* 15:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 15:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32240 and previous config saved to /var/cache/conftool/dbconfig/20220803-151502-marostegui.json
* 15:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
* 15:10 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
* 15:04 jelto: power off mc2023
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32239 and previous config saved to /var/cache/conftool/dbconfig/20220803-145956-marostegui.json
* 14:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
* 14:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32238 and previous config saved to /var/cache/conftool/dbconfig/20220803-145849-marostegui.json
* 14:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 14:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32237 and previous config saved to /var/cache/conftool/dbconfig/20220803-145828-marostegui.json
* 14:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:53 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.19 (duration: 05m 37s)
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:47 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.21 (duration: 06m 13s)
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32236 and previous config saved to /var/cache/conftool/dbconfig/20220803-144322-marostegui.json
* 14:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 14:33 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 14:32 Emperor: shutdown aqs200[5-8] prior to PDU work [[phab:T310070|T310070]]
* 14:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
* 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
* 14:31 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
* 14:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
* 14:28 jelto: power off thumbor2003 and thumbor2004
* 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32235 and previous config saved to /var/cache/conftool/dbconfig/20220803-142816-marostegui.json
* 14:27 moritzm: upgrading ganeti/esams to Ganeti 3.0.2 [[phab:T312637|T312637]]
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32234 and previous config saved to /var/cache/conftool/dbconfig/20220803-141310-marostegui.json
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32233 and previous config saved to /var/cache/conftool/dbconfig/20220803-141103-marostegui.json
* 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32232 and previous config saved to /var/cache/conftool/dbconfig/20220803-141042-marostegui.json
* 14:06 moritzm: installing freetype security updates on bullseye
* 13:57 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin 'P<nowiki>{</nowiki>R:Class = Confd<nowiki>}</nowiki>' 'systemctl restart confd'
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32231 and previous config saved to /var/cache/conftool/dbconfig/20220803-135536-marostegui.json
* 13:46 cdanis: ✔️ cdanis@deploy1002.eqiad.wmnet ~ 🕙☕ sudo systemctl restart confd
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32230 and previous config saved to /var/cache/conftool/dbconfig/20220803-134030-marostegui.json
* 13:30 moritzm: installing Java 8 security updates for Buster
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32229 and previous config saved to /var/cache/conftool/dbconfig/20220803-132524-marostegui.json
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32228 and previous config saved to /var/cache/conftool/dbconfig/20220803-131916-marostegui.json
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32227 and previous config saved to /var/cache/conftool/dbconfig/20220803-131855-marostegui.json
* 13:18 sukhe: depool codfw for PDU upgrade: CR 819798
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:16 urbanecm@deploy1002: Synchronized wmf-config/MetaContactPages.php: {{Gerrit|f89f02e306a1fa580fa41ba56de978f4208ea672}}: Amend license request contact form per Legal ([[phab:T303359|T303359]]) (duration: 09m 27s)
* 13:12 jbond: introduce puppetmaster[12]004 for now as offline
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
* 13:09 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
* 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 13:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 13:04 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32226 and previous config saved to /var/cache/conftool/dbconfig/20220803-130348-marostegui.json
* 12:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 12:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: [[phab:T310070|T310070]]
* 12:56 pt1979@cumin1001: START - Cookbook sre.dns.netbox
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32224 and previous config saved to /var/cache/conftool/dbconfig/20220803-124842-marostegui.json
* 12:40 moritzm: uploaded openjdk-8 8u342-b07-1~deb10u1  to component/jdk8 for buster-wikimedia (rebuild of latest Java 8 security update)
* 12:36 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 12:36 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32223 and previous config saved to /var/cache/conftool/dbconfig/20220803-123336-marostegui.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32222 and previous config saved to /var/cache/conftool/dbconfig/20220803-122929-marostegui.json
* 12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32221 and previous config saved to /var/cache/conftool/dbconfig/20220803-122819-marostegui.json
* 12:16 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@614f7b2]: (no justification provided) (duration: 00m 11s)
* 12:16 ebysans@deploy1002: Started deploy [airflow-dags/analytics@614f7b2]: (no justification provided)
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32220 and previous config saved to /var/cache/conftool/dbconfig/20220803-121313-marostegui.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32219 and previous config saved to /var/cache/conftool/dbconfig/20220803-115807-marostegui.json
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2176 to s1 [[phab:T311494|T311494]]', diff saved to https://phabricator.wikimedia.org/P32218 and previous config saved to /var/cache/conftool/dbconfig/20220803-115706-marostegui.json
* 11:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, [[phab:T310145|T310145]]
* 11:49 root@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, [[phab:T310145|T310145]]
* 11:46 jayme@cumin1001: conftool action : set/weight=10; selector: name=(kubernetes2019.codfw.wmnet{{!}}kubernetes2021.codfw.wmnet{{!}}kubernetes2022.codfw.wmnet{{!}}kubernetes2018.codfw.wmnet{{!}}kubernetes2020.codfw.wmnet)
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32217 and previous config saved to /var/cache/conftool/dbconfig/20220803-114301-marostegui.json
* 11:41 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=(kubernetes2020.codfw.wmnet{{!}}kubernetes2009.codfw.wmnet{{!}}kubernetes2010.codfw.wmnet{{!}}kubernetes2011.codfw.wmnet{{!}}kubernetes2012.codfw.wmnet{{!}}kubestage2002.codfw.wmnet)
* 11:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase2022.codfw.wmnet
* 11:37 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
* 11:35 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:32 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 11:26 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=wdqs
* 11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=kartotherian
* 11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-backend
* 11:21 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async
* 11:17 _joe_: depooling codfw services from all traffic
* 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2011.codfw.wmnet to cluster codfw and group C
* 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2011.codfw.wmnet to cluster codfw and group C
* 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
* 10:47 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
* 10:46 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32216 and previous config saved to /var/cache/conftool/dbconfig/20220803-104246-marostegui.json
* 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32215 and previous config saved to /var/cache/conftool/dbconfig/20220803-104224-marostegui.json
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
* 10:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase201[45].codfw.wmnet
* 10:38 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2022.codfw.wmnet
* 10:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
* 10:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
* 10:37 jelto: shutdown kubestage2002 kubernetes2020 kubernetes2009 kubernetes2010 kubernetes2011 kubernetes2012
* 10:30 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
* 10:30 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
* 10:29 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
* 10:29 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
* 10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
* 10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
* 10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
* 10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32213 and previous config saved to /var/cache/conftool/dbconfig/20220803-102718-marostegui.json
* 10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2012.codfw.wmnet
* 10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2011.codfw.wmnet
* 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.codfw.wmnet
* 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2009.codfw.wmnet
* 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2020.codfw.wmnet
* 10:20 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubestage2002.codfw.wmnet
* 10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
* 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2011.codfw.wmnet with OS bullseye
* 10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
* 10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
* 10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32212 and previous config saved to /var/cache/conftool/dbconfig/20220803-101212-marostegui.json
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32211 and previous config saved to /var/cache/conftool/dbconfig/20220803-095706-marostegui.json
* 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
* 09:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2021.codfw.wmnet
* 09:56 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2012.codfw.wmnet
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32210 and previous config saved to /var/cache/conftool/dbconfig/20220803-095559-marostegui.json
* 09:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 09:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32209 and previous config saved to /var/cache/conftool/dbconfig/20220803-095538-marostegui.json
* 09:55 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase2027.codfw.wmnet
* 09:54 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2011.codfw.wmnet
* 09:54 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:54 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
* 09:52 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2010.codfw.wmnet
* 09:50 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2009.codfw.wmnet
* 09:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
* 09:48 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
* 09:47 jelto: kubectl drain --ignore-daemonsets kubernetes2020.codfw.wmnet
* 09:46 jelto: kubectl cordon kubernetes2020.codfw.wmnet kubernetes2009.codfw.wmnet kubernetes2010.codfw.wmnet kubernetes2011.codfw.wmnet kubernetes2012.codfw.wmnet
* 09:43 jelto: kubectl drain --ignore-daemonsets kubestage2002.codfw.wmnet
* 09:43 vgutierrez: rolling restart of pybal in codfw lvs instances - [[phab:T310070|T310070]]
* 09:42 jelto: kubectl cordon kubestage2002
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32208 and previous config saved to /var/cache/conftool/dbconfig/20220803-094032-marostegui.json
* 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS bullseye
* 09:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@674bb8b]: (no justification provided) (duration: 00m 10s)
* 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2090.codfw.wmnet
* 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:33 ebysans@deploy1002: Started deploy [airflow-dags/analytics@674bb8b]: (no justification provided)
* 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 09:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 09:29 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2090.codfw.wmnet
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32207 and previous config saved to /var/cache/conftool/dbconfig/20220803-092525-marostegui.json
* 09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:23 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
* 09:23 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:22 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 09:22 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2090 from dbctl [[phab:T314109|T314109]]', diff saved to https://phabricator.wikimedia.org/P32206 and previous config saved to /var/cache/conftool/dbconfig/20220803-092053-marostegui.json
* 09:20 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
* 09:15 jelto: power on mc2024
* 09:10 XioNoX: configure BGP on the esams-drmrs link - [[phab:T307221|T307221]]
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32205 and previous config saved to /var/cache/conftool/dbconfig/20220803-091019-marostegui.json
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32204 and previous config saved to /var/cache/conftool/dbconfig/20220803-090912-marostegui.json
* 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 09:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32203 and previous config saved to /var/cache/conftool/dbconfig/20220803-090836-marostegui.json
* 09:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2032.codfw.wmnet
* 09:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
* 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
* 09:04 jynus: stop backup2006 backup2009 for [[phab:T310070|T310070]]
* 09:00 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
* 09:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
* 08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
* 08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2032.codfw.wmnet
* 08:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
* 08:58 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
* 08:58 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
* 08:57 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
* 08:57 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
* 08:54 XioNoX: put the esams-drmrs link in service - [[phab:T307221|T307221]]
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32202 and previous config saved to /var/cache/conftool/dbconfig/20220803-085330-marostegui.json
* 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
* 08:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:47 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 08:41 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32201 and previous config saved to /var/cache/conftool/dbconfig/20220803-083824-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32200 and previous config saved to /var/cache/conftool/dbconfig/20220803-082318-marostegui.json
* 08:19 jynus: stop db2098 for [[phab:T310070|T310070]]
* 08:17 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2072.codfw.wmnet
* 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:54 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2072.codfw.wmnet
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2072 from dbctl [[phab:T313911|T313911]]', diff saved to https://phabricator.wikimedia.org/P32199 and previous config saved to /var/cache/conftool/dbconfig/20220803-074806-marostegui.json
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32197 and previous config saved to /var/cache/conftool/dbconfig/20220803-072253-marostegui.json
* 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32196 and previous config saved to /var/cache/conftool/dbconfig/20220803-072214-marostegui.json
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
* 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
* 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
* 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
* 07:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
* 07:11 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:819227{{!}}CX: Set MT threshold for publishing in Armenian WP to 80% (T313208)]] (duration: 03m 49s)
* 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32195 and previous config saved to /var/cache/conftool/dbconfig/20220803-070708-marostegui.json
* 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 07:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 07:00 moritzm: draining ganeti2011 [[phab:T311686|T311686]]
* 06:56 godog: grow sda/sdb 3 by 100G on thanos-be2003 - [[phab:T314275|T314275]]
* 06:56 godog: grow sda/sdb 3 by 100G on thanos-be1002 - [[phab:T314275|T314275]]
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32194 and previous config saved to /var/cache/conftool/dbconfig/20220803-065202-marostegui.json
* 06:46 godog: power up centrallog2002 and prometheus2005 - [[phab:T310070|T310070]]
* 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 06:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32193 and previous config saved to /var/cache/conftool/dbconfig/20220803-063656-marostegui.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32192 and previous config saved to /var/cache/conftool/dbconfig/20220803-063148-marostegui.json
* 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
* 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
* 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32191 and previous config saved to /var/cache/conftool/dbconfig/20220803-063045-marostegui.json
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32190 and previous config saved to /var/cache/conftool/dbconfig/20220803-061538-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32189 and previous config saved to /var/cache/conftool/dbconfig/20220803-060032-marostegui.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32188 and previous config saved to /var/cache/conftool/dbconfig/20220803-054526-marostegui.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32187 and previous config saved to /var/cache/conftool/dbconfig/20220803-054106-marostegui.json
* 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
== 2022-08-02 ==
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:15 mutante: gerrit - syncing data (/srv/gerrit /var/lib/gerrit2/review_site  /home) again after gerrit2002 was reimaged with buster [[phab:T313250|T313250]] [[phab:T313972|T313972]]
* 22:04 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 06s)
* 22:04 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:29 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/includes/Sanity/Checker.php: Backport: [[gerrit:819621{{!}}Fix appending of join conds (T312421 T314439)]] (duration: 03m 15s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:27 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - [[phab:T314078|T314078]]
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS buster
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.22  refs [[phab:T308076|T308076]]
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 20:50 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 20:38 mutante: re-imaging gerrit2002 with buster - because it's on bullseye, needs git-fat and that has not been ported to python3 yet which blocks upgrading gerrit machines otherwise [[phab:T313250|T313250]] [[phab:T243027|T243027]] [[phab:T279509|T279509]]
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS buster
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:36 urbanecm: UTC evening B&C window done
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/HTMLTransformInput.php: {{Gerrit|69e91528a5c6f372af520307dc2f4227b9981442}}: ParsoidHandler: fix page bundle input with no orig HTML (duration: 03m 22s)
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:29 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/ParsoidHandler.php: {{Gerrit|322a960e3777bc01fa8823908340c36e3851a648}}: ParsoidHandler: pass metrics object to HTMLTransformInput (duration: 03m 19s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5fac0aaf8e76a6f8cc3302771eac068e4f866e5f}}: GrowthExperiments: Remove wgGEHomepageTutorialTitle (duration: 03m 26s)
* 20:06 dancy@deploy1002: Finished scap: Backport for [[gerrit:819612]] Revert "Bump wikimedia/parsoid to 0.16.0-a18" (duration: 11m 30s)
* 20:01 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 05s)
* 20:01 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 19:59 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 01s)
* 19:59 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
* 19:55 dancy@deploy1002: Started scap: Backport for [[gerrit:819612]] Revert "Bump wikimedia/parsoid to 0.16.0-a18"
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-tls
* 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=varnish-fe
* 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-tls
* 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=varnish-fe
* 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
* 19:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2041,2046].codfw.wmnet
* 19:35 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2041,2046].codfw.wmnet
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-fe2002.codfw.wmnet
* 19:28 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for thanos-fe2002.codfw.wmnet
* 19:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe2010.codfw.wmnet
* 19:26 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-fe2010.codfw.wmnet
* 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-tls
* 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=varnish-fe
* 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-be
* 19:17 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
* 19:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
* 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-tls
* 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=varnish-fe
* 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
* 19:11 mutante: gerrit1001 - rsyncing /home/ to gerrit2002:/srv/home-gerrit1001.wikimedia.org [[phab:T313250|T313250]]
* 19:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
* 19:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
* 18:55 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]] (duration: 50m 39s)
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:52 ejegg: updated payments-wiki from {{Gerrit|589bb64e}} to {{Gerrit|e1b6036a}} (just i18n changes in extensions)
* 18:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - [[phab:T314078|T314078]]
* 18:46 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mc2038.codfw.wmnet with reason: install
* 18:45 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mc2038.codfw.wmnet with reason: install
* 18:41 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2038.codfw.wmnet
* 18:41 rzl@cumin2002: START - Cookbook sre.hosts.remove-downtime for mc2038.codfw.wmnet
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:18 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: install
* 18:18 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: install
* 18:17 rzl@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2038.codfw.wmnet with reason: install
* 18:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2038.codfw.wmnet with reason: install
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
* 18:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:04 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.23  refs [[phab:T308076|T308076]]
* 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32185 and previous config saved to /var/cache/conftool/dbconfig/20220802-175233-marostegui.json
* 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2159', diff saved to https://phabricator.wikimedia.org/P32184 and previous config saved to /var/cache/conftool/dbconfig/20220802-174311-ladsgroup.json
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32183 and previous config saved to /var/cache/conftool/dbconfig/20220802-173723-marostegui.json
* 17:35 moritzm: installing node-moment security updates
* 17:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: [[phab:T310070|T310070]]
* 17:32 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: [[phab:T310070|T310070]]
* 17:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
* 17:25 moritzm: installing fribidi security updates
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32182 and previous config saved to /var/cache/conftool/dbconfig/20220802-172217-marostegui.json
* 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-tls
* 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=varnish-fe
* 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
* 17:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32181 and previous config saved to /var/cache/conftool/dbconfig/20220802-170711-marostegui.json
* 17:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
* 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
* 17:05 Emperor: ms-be20[31,32,41,46].codfw.wmnet,ms-fe2010.codfw.wmnet,thanos-fe2002.codfw.wmnet downtime for PDU work [[phab:T309957|T309957]]
* 17:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32180 and previous config saved to /var/cache/conftool/dbconfig/20220802-170503-marostegui.json
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 17:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
* 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 17:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
* 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 17:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32179 and previous config saved to /var/cache/conftool/dbconfig/20220802-170333-marostegui.json
* 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-tls
* 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=varnish-fe
* 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
* 17:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2030,2045,2052].codfw.wmnet
* 17:00 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2030,2045,2052].codfw.wmnet
* 16:57 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1004.eqiad.wmnet
* 16:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 16:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 16:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32178 and previous config saved to /var/cache/conftool/dbconfig/20220802-164827-marostegui.json
* 16:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 16:35 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 16:35 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32177 and previous config saved to /var/cache/conftool/dbconfig/20220802-163321-marostegui.json
* 16:29 dancy@mwmaint1002: pull aborted:  (duration: 00m 07s)
* 16:25 rzl: rzl@stat1007:~$ sudo systemctl stop wmde-analytics-daily-early  # wedged, timer will restart it now with max_runtime_seconds
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32176 and previous config saved to /var/cache/conftool/dbconfig/20220802-161815-marostegui.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32175 and previous config saved to /var/cache/conftool/dbconfig/20220802-161607-marostegui.json
* 16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32174 and previous config saved to /var/cache/conftool/dbconfig/20220802-161545-marostegui.json
* 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1004.eqiad.wmnet on all recursors
* 16:10 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1004.eqiad.wmnet on all recursors
* 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:05 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 16:05 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1004.eqiad.wmnet
* 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32173 and previous config saved to /var/cache/conftool/dbconfig/20220802-160039-marostegui.json
* 15:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32172 and previous config saved to /var/cache/conftool/dbconfig/20220802-154533-marostegui.json
* 15:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
* 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
* 15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2037.codfw.wmnet
* 15:36 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32171 and previous config saved to /var/cache/conftool/dbconfig/20220802-153027-marostegui.json
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32170 and previous config saved to /var/cache/conftool/dbconfig/20220802-152818-marostegui.json
* 15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32169 and previous config saved to /var/cache/conftool/dbconfig/20220802-152740-marostegui.json
* 15:24 moritzm: installing gnupg2 security updates
* 15:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
* 15:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster1004.eqiad.wmnet with OS buster
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32167 and previous config saved to /var/cache/conftool/dbconfig/20220802-151234-marostegui.json
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:08 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
* 15:08 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
* 15:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T310070|T310070]]
* 15:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 15:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
* 15:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
* 14:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 14:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: [[phab:T309957|T309957]]
* 14:58 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32166 and previous config saved to /var/cache/conftool/dbconfig/20220802-145728-marostegui.json
* 14:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2060.codfw.wmnet with OS bullseye
* 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
* 14:50 moritzm: uploaded gnupg2 2.1.18-8~deb9u4+wmf1 to stretch-wikimedia
* 14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32164 and previous config saved to /var/cache/conftool/dbconfig/20220802-144222-marostegui.json
* 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32163 and previous config saved to /var/cache/conftool/dbconfig/20220802-144013-marostegui.json
* 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32162 and previous config saved to /var/cache/conftool/dbconfig/20220802-143952-marostegui.json
* 14:37 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetmaster1004.eqiad.wmnet with OS buster
* 14:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
* 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32161 and previous config saved to /var/cache/conftool/dbconfig/20220802-142446-marostegui.json
* 14:23 Emperor: shutdown ms-be20[30,45,52] for PDU work [[phab:T309957|T309957]]
* 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 14:21 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
* 14:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32160 and previous config saved to /var/cache/conftool/dbconfig/20220802-140940-marostegui.json
* 14:05 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster2004.codfw.wmnet with OS buster
* 14:04 godog: grow sda/sdb 3 by 100G on thanos-be1001 - [[phab:T314275|T314275]]
* 14:03 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
* 14:03 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
* 14:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
* 14:01 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
* 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-tls
* 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2032.codfw.wmnet,service=ats-be
* 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
* 13:56 godog: schedule poweroff for centrallog2002 at 16 utc - [[phab:T310070|T310070]]
* 13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-be
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32159 and previous config saved to /var/cache/conftool/dbconfig/20220802-135435-marostegui.json
* 13:53 godog: depool and poweroff prometheus2005 - [[phab:T310070|T310070]]
* 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
* 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
* 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=varnish-fe
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32158 and previous config saved to /var/cache/conftool/dbconfig/20220802-135226-marostegui.json
* 13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
* 13:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32157 and previous config saved to /var/cache/conftool/dbconfig/20220802-135155-marostegui.json
* 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
* 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=varnish-fe
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=varnish-fe
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-tls
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=varnish-fe
* 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-be
* 13:45 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
* 13:42 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2013.codfw.wmnet with OS bullseye
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754933{{!}}Enable usage tracking for statement for cebwiki (T296384)]] – expected to gradually increase number of wbc_entity_usage and probably recentchanges rows on cebwiki, but not too much, see task for details (duration: 03m 06s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2028.codfw.wmnet with OS bullseye
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32156 and previous config saved to /var/cache/conftool/dbconfig/20220802-133648-marostegui.json
* 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:754937{{!}}Introduce $wmgEntityUsageModifierLimitsStatement (T296384)]] (2/2) (duration: 03m 21s)
* 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754937{{!}}Introduce $wmgEntityUsageModifierLimitsStatement (T296384)]] (1/2) (duration: 03m 16s)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T309957|T309957]]
* 13:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, [[phab:T309957|T309957]]
* 13:27 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster2004.codfw.wmnet with OS buster
* 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
* 13:24 vgutierrez: restarting ATS 9.x instances to apply https://gerrit.wikimedia.org/r/819585 - [[phab:T309651|T309651]]
* 13:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32155 and previous config saved to /var/cache/conftool/dbconfig/20220802-132142-marostegui.json
* 13:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
* 13:19 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a4499e5ac23a0558bed276e2b74134590afc5c95}}:  Revert "testwiki: Add mediawiki.web_ui.interactions stream" ([[phab:T314151|T314151]], [[phab:T311268|T311268]]) (duration: 03m 19s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c2fb8a58d8f62e29a15ebee26198e79e4597d24c}}: Enable RealtimePreview on Group 0 wikis ([[phab:T314150|T314150]]) (duration: 03m 21s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32154 and previous config saved to /var/cache/conftool/dbconfig/20220802-130636-marostegui.json
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32153 and previous config saved to /var/cache/conftool/dbconfig/20220802-130428-marostegui.json
* 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 13:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 13:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32152 and previous config saved to /var/cache/conftool/dbconfig/20220802-130351-marostegui.json
* 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2013.codfw.wmnet with OS bullseye
* 13:00 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2028.codfw.wmnet with OS bullseye
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32151 and previous config saved to /var/cache/conftool/dbconfig/20220802-124845-marostegui.json
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32150 and previous config saved to /var/cache/conftool/dbconfig/20220802-123338-marostegui.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32149 and previous config saved to /var/cache/conftool/dbconfig/20220802-121832-marostegui.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312972|T312972]])', diff saved to https://phabricator.wikimedia.org/P32148 and previous config saved to /var/cache/conftool/dbconfig/20220802-121624-marostegui.json
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:01 marostegui: dbmaint x1@eqiad [[phab:T314087|T314087]]
* 11:57 marostegui: dbmaint s7@eqiad [[phab:T314377|T314377]]
* 11:57 marostegui: dbmaint s3@eqiad [[phab:T314377|T314377]]
* 11:57 marostegui: dbmaint s8@eqiad [[phab:T314377|T314377]]
* 11:55 marostegui: dbmait s8@eqiad [[phab:T314377|T314377]]
* 11:54 marostegui: dbmait s3@eqiad [[phab:T314377|T314377]]
* 11:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 11:48 marostegui: dbmait s7@eqiad [[phab:T314377|T314377]]
* 11:46 marostegui: dbmait s4@eqiad [[phab:T314377|T314377]]
* 11:35 elukey: restart rsyslog on ml-serve1006
* 10:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: [[phab:T312626|T312626]] btullis
* 10:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: [[phab:T312626|T312626]] btullis
* 10:49 godog: grow sda3 by 100G on thanos-be2004 - [[phab:T314275|T314275]]
* 10:42 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 10:42 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P32147 and previous config saved to /var/cache/conftool/dbconfig/20220802-103318-root.json
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P32146 and previous config saved to /var/cache/conftool/dbconfig/20220802-101813-root.json
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2175 to s2 [[phab:T311494|T311494]]', diff saved to https://phabricator.wikimedia.org/P32145 and previous config saved to /var/cache/conftool/dbconfig/20220802-101522-marostegui.json
* 10:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1019.eqiad.wmnet with OS bullseye
* 10:05 jynus: shutdown dbprov2002 backup2005 backup2008 [[phab:T310070|T310070]]
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P32144 and previous config saved to /var/cache/conftool/dbconfig/20220802-100308-root.json
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32143 and previous config saved to /var/cache/conftool/dbconfig/20220802-100304-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2079 from dbctl [[phab:T313885|T313885]]', diff saved to https://phabricator.wikimedia.org/P32141 and previous config saved to /var/cache/conftool/dbconfig/20220802-095455-marostegui.json
* 09:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
* 09:49 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
* 09:49 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P32140 and previous config saved to /var/cache/conftool/dbconfig/20220802-094804-root.json
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32139 and previous config saved to /var/cache/conftool/dbconfig/20220802-094759-root.json
* 09:44 godog: grow sdb3 by 100G on thanos-be2004 - [[phab:T314275|T314275]]
* 09:43 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
* 09:42 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 09:37 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1019.eqiad.wmnet with OS bullseye
* 09:36 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P32138 and previous config saved to /var/cache/conftool/dbconfig/20220802-093259-root.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32137 and previous config saved to /var/cache/conftool/dbconfig/20220802-093254-root.json
* 09:30 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 09:30 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:28 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
* 09:26 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 09:25 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 09:22 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P32136 and previous config saved to /var/cache/conftool/dbconfig/20220802-091754-root.json
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32135 and previous config saved to /var/cache/conftool/dbconfig/20220802-091749-root.json
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2143', diff saved to https://phabricator.wikimedia.org/P32134 and previous config saved to /var/cache/conftool/dbconfig/20220802-091518-root.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P32133 and previous config saved to /var/cache/conftool/dbconfig/20220802-090250-root.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32132 and previous config saved to /var/cache/conftool/dbconfig/20220802-090245-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P32131 and previous config saved to /var/cache/conftool/dbconfig/20220802-084745-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32130 and previous config saved to /var/cache/conftool/dbconfig/20220802-084740-root.json
* 08:46 marostegui: stop mysql on db2095 db2107 db2109 db2137 db2147 db2159 db2160 pc2012 for pdu maintenance on codfw b5 [[phab:T310070|T310070]]
* 07:49 moritzm: upgrading drmrs ganeti clusters to 3.0.2 [[phab:T312637|T312637]]
* 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 07:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, [[phab:T311686|T311686]]
* 07:22 godog: bounce icinga on alert2001 - [[phab:T314353|T314353]]
* 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 07:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 06:58 elukey: restart rsyslog on ml-serve2006
* 06:56 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:819077{{!}}pruneRevData: Make cleaning in larger batches (T296380)]] (duration: 03m 26s)
* 06:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:46 godog: bounce icinga on alert1001 - [[phab:T314353|T314353]]
* 05:48 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db2088.codfw.wmnet
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:44 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2088.codfw.wmnet
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P32127 and previous config saved to /var/cache/conftool/dbconfig/20220802-052923-root.json
* 05:24 marostegui: dbmait x1@eqiad [[phab:T314087|T314087]]
* 04:17 ryankemper: [Elastic] Small amendment to my earlier statement; based off epoch time `be_x_oldwiki_titlesuggest_1659407912` was not an old index hanging around after a reindex operation, but rather the new one that the reindex operation was trying to create, but had not yet finished (therefore didn't switch over the aliases). It presumably got interrupted by the reimage of `elastic2059`.
* 04:15 ryankemper: [Elastic] Blew away red index like so: `ryankemper@cumin1001:~$ curl -XDELETE https://search.svc.codfw.wmnet:9243/be_x_oldwiki_titlesuggest_1659407912`. Cluster is back to `green` status.
* 04:07 ryankemper: [Elastic] Per `curl -s https://search.svc.codfw.wmnet:9243/_cat/aliases {{!}} grep -i be_x` I see `be_x_oldwiki_titlesuggest ` alias points to `be_x_oldwiki_titlesuggest_1658396688`. I think this means the red index is an old index from an in-progress reindex operation. I likely just need to delete `be_x_oldwiki_titlesuggest_1659407912` but doing some quick digging first
* 04:04 ryankemper: [Elastic] Red cluster status in main codfw elasticsearch cluster (`https://search.svc.codfw.wmnet:9243`); culprit appears to be index `be_x_oldwiki_titlesuggest_1659407912`. Confusingly it has 2 replicas set so it's not clear to me how we got into this state starting from green (in the past we've gone into red status from indices that erroneously had 0 replicas in production)
* 03:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:40 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I0802db272695}} (duration: 03m 10s)
* 03:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:34 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I9b89c0ff5c2}} (duration: 03m 32s)
* 03:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:27 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I6e97d39a3}}, {{Gerrit|Ib843ebced31}} (duration: 03m 30s)
* 03:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:22 krinkle@mwmaint1002: pull aborted:  (duration: 00m 11s)
* 03:21 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I39a2b86065}} (duration: 03m 19s)
* 03:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic2059.codfw.wmnet with OS bullseye
* 03:15 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ieaea60a991e5611}} (duration: 03m 03s)
* 03:14 krinkle@mwmaint2002: pull aborted:  (duration: 01m 36s)
* 03:14 krinkle@mwmaint1002: pull aborted:  (duration: 01m 31s)
* 03:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
* 02:54 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service` to clear `Query Service HTTP Port` && `WDQS SPARQL` alerts
* 02:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
* 02:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2059.codfw.wmnet with OS bullseye
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:35 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ieaea60a991e5}} (duration: 03m 10s)
* 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:23 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ia3406eba4ab8bb}} (duration: 03m 22s)
* 00:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
== 2022-08-01 ==
* 23:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Id1ce285631f5}}, {{Gerrit|I194d419fbfe}} (duration: 03m 09s)
* 23:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:08 moritzm: drain ganeti2028 [[phab:T309957|T309957]]
* 21:03 mutante: gerrit2002 - mkdir /var/lib/gerrit2/review_site {{!}} gerrit1001 - rsyncing /var/lib/gerrit2/review_site/ to gerrit2002 [[phab:T313250|T313250]] [[phab:T313972|T313972]]
* 21:01 urbanecm: UTC late backport window done
* 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|461e0709a8987b110f669b74afc38c706b616e5d}}: itwiki: Change robot policy on NS2 and NS3 ([[phab:T314165|T314165]]) (duration: 03m 18s)
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:57 mutante: phab1001 - rsyncing repo data /srv/repos/ to phab2002 (in addition to phab1004 previously) [[phab:T313360|T313360]]
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:55 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mnwwiktionary  --fix # [[phab:T314023|T314023]]
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba8c17759b7e737a6757792ad4136ff3af00030c}}: mnwwiktionary: Create Appendix namespace ([[phab:T314023|T314023]]) (duration: 03m 09s)
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateArticleCount.php --wiki=viwikibooks --update # [[phab:T314239|T314239]]
* 20:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c19c3e36ab}}: DiscussionTools: Make new reply buttons available at mediawiki.org ([[phab:T314076|T314076]]); {{Gerrit|24db016c4}}: viwikibooks: Change wgArticleCountMethod to any ([[phab:T314239|T314239]]) (duration: 03m 10s)
* 20:35 daniel@deploy1002: Synchronized php-1.39.0-wmf.22/includes/Rest/Handler: Fix: [[gerrit:819129{{!}}Parsoid REST handler: allow pagebundle input without original HTML.]] (duration: 03m 15s)
* 20:25 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-ne.svg ([[phab:T311700|T311700]])
* 20:21 daniel@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ne.svg: Config: [[gerrit:818614{{!}}newiki: Update wordmark (T311700)]] (duration: 03m 17s)
* 20:17 daniel@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:818614{{!}}newiki: Update wordmark (T311700)]] (duration: 03m 32s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2054.codfw.wmnet with OS bullseye
* 19:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
* 19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
* 19:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2054.codfw.wmnet with OS bullseye
* 18:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2031.codfw.wmnet with OS bullseye
* 18:44 mutante: gitlab - moved data_persistence group to new parent, under /repos/
* 18:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
* 18:32 mutante: gitlab - created group 'data_persistence' - added Ladsgroup and upgraded from member to maintainer
* 18:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
* 18:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2031.codfw.wmnet with OS bullseye
* 17:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2025.codfw.wmnet with OS bullseye
* 17:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
* 17:31 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
* 17:18 ryankemper: [[phab:T289135|T289135]] [[phab:T314078|T314078]] Manually reimaging remaining codfw stretch hosts (`elastic[2025,2031,2054,2059-2060]`) to bullseye, one host at a time, waiting for green cluster status to return between each run. `ryankemper@cumin1001` tmux session `codfw_reimage`
* 17:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2025.codfw.wmnet with OS bullseye
* 17:08 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 17:08 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 17:06 mutante: alert1001 - systemctl restart nsca - pinged by fundraising tech because fundraising hosts have the "passive check is awol" issue again ([[phab:T196336|T196336]])
* 16:25 moritzm: installing tcpdump updates from bullseye point release
* 16:23 cwhite@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
* 16:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1018.eqiad.wmnet with OS bullseye
* 16:10 cwhite@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
* 15:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1018.eqiad.wmnet with OS bullseye
* 15:39 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase [[phab:T309896|T309896]] - mvernon@cumin1001
* 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:29 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase [[phab:T309896|T309896]] - mvernon@cumin1001
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:818127{{!}}Beta: add configuration for redirect badges (T313896)]] (2/2, should be a no-op) (duration: 03m 30s)
* 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:818127{{!}}Beta: add configuration for redirect badges (T313896)]] (1/2, should be a no-op) (duration: 03m 15s)
* 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:54 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:53 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:42 moritzm: installing openjdk-11 security updates
* 14:39 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:38 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:34 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:28 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
* 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/skins/Vector/: {{Gerrit|b5007c5f1c389deb344c5bb99e950b4190436cab}}: Revert "styles: Unify on standard external link icon"" (duration: 03m 16s)
* 14:12 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:12 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2044.codfw.wmnet with OS bullseye
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|bcb7b0d4d07b454a169804d7b1011ec3f2530c00}}: Adjust width-height ratio of logo to fix display issue ([[phab:T310961|T310961]]; 2/2) (duration: 03m 17s)
* 14:04 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/srwikisource<nowiki>{</nowiki>.png;-1.5x.png;-2x.png<nowiki>}</nowiki> ([[phab:T310961|T310961]])
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|bcb7b0d4d07b454a169804d7b1011ec3f2530c00}}: srwikisource: Adjust width-height ratio of logo to fix display issue ([[phab:T310961|T310961]]; 1/2) (duration: 03m 41s)
* 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 urbanecm: UTC afternoon backport window is going to overflow by a couple of minutes
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
* 13:44 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
* 13:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2044.codfw.wmnet with OS bullseye
* 13:22 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 11:50 moritzm: installing openjdk-8 security updates for stretch
* 11:43 moritzm: uploaded openjdk-8 8u342-b07-1~deb9u1 for stretch-wikimedia
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32124 and previous config saved to /var/cache/conftool/dbconfig/20220801-102714-ladsgroup.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32123 and previous config saved to /var/cache/conftool/dbconfig/20220801-101208-ladsgroup.json
* 10:09 vgutierrez: test ATS 9.1.2 on cp6016 - [[phab:T309651|T309651]]
* 10:05 vgutierrez: test ATS 9.1.2 on cp6008 - [[phab:T309651|T309651]]
* 10:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@4da9195]: (no justification provided) (duration: 00m 19s)
* 10:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@4da9195]: (no justification provided)
* 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32122 and previous config saved to /var/cache/conftool/dbconfig/20220801-095702-ladsgroup.json
* 09:56 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@85585b0]: (no justification provided) (duration: 00m 05s)
* 09:56 ebysans@deploy1002: Started deploy [airflow-dags/analytics@85585b0]: (no justification provided)
* 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32121 and previous config saved to /var/cache/conftool/dbconfig/20220801-094156-ladsgroup.json
* 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P32120 and previous config saved to /var/cache/conftool/dbconfig/20220801-093845-ladsgroup.json
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
* 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Maintenance
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 09:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2004.codfw.wmnet
* 09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
* 09:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2003.codfw.wmnet
* 09:01 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
* 09:00 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:53 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/includes/api: Backport: [[gerrit:818562{{!}}api: Support for links migration in ApiQueryBacklinks (T312865 T314112)]] (duration: 03m 01s)
* 08:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:50 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
* 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
* 08:48 godog: thanos-be2004: copy quarantined and tmp off sdb3 and into sdb4 for analysis and to free space - [[phab:T314275|T314275]]
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:818998{{!}}Stop writing to the old templatelinks columns in itwikisource (T312865)]] (duration: 03m 12s)
* 08:43 vgutierrez: rolling upgrade of HAProxy to version 2.4.18
* 08:43 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:41 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:39 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
* 08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
* 08:28 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 08:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
* 08:14 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
* 06:19 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers{{!}}api)-ro,name=codfw
* 06:14 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appservers-ro
* 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appserver-ro
* 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=(appserver{{!}}api)-ro
* 05:43 moritzm: installing Linux 5.10.127-2 on Gitlab runners
* 01:00 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ic0dbcba9f60f20a}} (duration: 03m 31s)
* 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:45 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|I9d363abd7cfef}} (duration: 03m 17s)
* 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
==Archives ==
==Archives ==
See [[Server Admin Log/Archives]].
See [[Server Admin Log/Archives]].

Latest revision as of 21:50, 9 June 2023

2023-06-09

  • 21:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
  • 21:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 20:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
  • 20:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 20:38 btullis@cumin1001: END (ERROR) - Cookbook sre.aqs.roll-restart-reboot (exit_code=97) rolling restart_daemons on A:aqs
  • 20:23 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs
  • 17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS bullseye
  • 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
  • 17:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T336886)', diff saved to https://phabricator.wikimedia.org/P49398 and previous config saved to /var/cache/conftool/dbconfig/20230609-173202-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P49397 and previous config saved to /var/cache/conftool/dbconfig/20230609-171656-ladsgroup.json
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P49396 and previous config saved to /var/cache/conftool/dbconfig/20230609-170150-ladsgroup.json
  • 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T336886)', diff saved to https://phabricator.wikimedia.org/P49395 and previous config saved to /var/cache/conftool/dbconfig/20230609-164644-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T336886)', diff saved to https://phabricator.wikimedia.org/P49394 and previous config saved to /var/cache/conftool/dbconfig/20230609-163007-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T336886)', diff saved to https://phabricator.wikimedia.org/P49393 and previous config saved to /var/cache/conftool/dbconfig/20230609-162946-ladsgroup.json
  • 16:20 urandom: powercycling restbase1028
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P49392 and previous config saved to /var/cache/conftool/dbconfig/20230609-161440-ladsgroup.json
  • 16:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host snapshot1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['snapshot1016']
  • 16:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['snapshot1016']
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P49391 and previous config saved to /var/cache/conftool/dbconfig/20230609-155934-ladsgroup.json
  • 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host snapshot1016.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T336886)', diff saved to https://phabricator.wikimedia.org/P49390 and previous config saved to /var/cache/conftool/dbconfig/20230609-154428-ladsgroup.json
  • 15:30 andrewbogott: wikitech-static: deleted everything in /srv/mediawiki/images/wikitech/archive for T338520
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T336886)', diff saved to https://phabricator.wikimedia.org/P49388 and previous config saved to /var/cache/conftool/dbconfig/20230609-152845-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T336886)', diff saved to https://phabricator.wikimedia.org/P49387 and previous config saved to /var/cache/conftool/dbconfig/20230609-152824-ladsgroup.json
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host snapshot1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host snapshot1016.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for snapshot101[6-7] - pt1979@cumin2002"
  • 15:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for snapshot101[6-7] - pt1979@cumin2002"
  • 15:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P49386 and previous config saved to /var/cache/conftool/dbconfig/20230609-151318-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P49385 and previous config saved to /var/cache/conftool/dbconfig/20230609-145812-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T336886)', diff saved to https://phabricator.wikimedia.org/P49384 and previous config saved to /var/cache/conftool/dbconfig/20230609-144305-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T336886)', diff saved to https://phabricator.wikimedia.org/P49383 and previous config saved to /var/cache/conftool/dbconfig/20230609-142731-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49382 and previous config saved to /var/cache/conftool/dbconfig/20230609-142655-ladsgroup.json
  • 14:14 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P49381 and previous config saved to /var/cache/conftool/dbconfig/20230609-141149-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P49380 and previous config saved to /var/cache/conftool/dbconfig/20230609-135643-ladsgroup.json
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49379 and previous config saved to /var/cache/conftool/dbconfig/20230609-134137-ladsgroup.json
  • 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 13:29 sukhe: start pybal on lvs2013
  • 13:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 13:25 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49378 and previous config saved to /var/cache/conftool/dbconfig/20230609-132541-ladsgroup.json
  • 13:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49377 and previous config saved to /var/cache/conftool/dbconfig/20230609-132520-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P49376 and previous config saved to /var/cache/conftool/dbconfig/20230609-131014-ladsgroup.json
  • 13:07 sukhe: stop pybal on lvs2013 to test lvs2014
  • 13:02 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs2014
  • 13:02 sukhe: sudo cumin 'A:lvs and A:codfw' 'enable-puppet "CR 928818"'
  • 13:01 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
  • 12:59 sukhe: sudo cumin 'A:lvs and A:codfw' 'disable-puppet "CR 928818"'
  • 12:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2014
  • 12:57 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
  • 12:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2014
  • 12:55 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P49373 and previous config saved to /var/cache/conftool/dbconfig/20230609-125508-ladsgroup.json
  • 12:50 krinkle@deploy1002: Finished scap: I385d28 (duration: 06m 59s)
  • 12:43 krinkle@deploy1002: Started scap: I385d28
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49371 and previous config saved to /var/cache/conftool/dbconfig/20230609-124002-ladsgroup.json
  • 12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-add DNS for cloud-hosts-codfw vlan. - cmooney@cumin1001"
  • 12:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-add DNS for cloud-hosts-codfw vlan. - cmooney@cumin1001"
  • 12:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49370 and previous config saved to /var/cache/conftool/dbconfig/20230609-122303-ladsgroup.json
  • 12:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T336886)', diff saved to https://phabricator.wikimedia.org/P49369 and previous config saved to /var/cache/conftool/dbconfig/20230609-122243-ladsgroup.json
  • 12:16 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:16 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2003-dev - aborrero@cumin2002"
  • 12:15 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2003-dev - aborrero@cumin2002"
  • 12:13 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P49368 and previous config saved to /var/cache/conftool/dbconfig/20230609-120737-ladsgroup.json
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Fsero out of all services on: 778 hosts
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P49367 and previous config saved to /var/cache/conftool/dbconfig/20230609-115230-ladsgroup.json
  • 11:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Fsero out of all services on: 778 hosts
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Fsero out of all services on: 1262 hosts
  • 11:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Fsero out of all services on: 1262 hosts
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T336886)', diff saved to https://phabricator.wikimedia.org/P49366 and previous config saved to /var/cache/conftool/dbconfig/20230609-113724-ladsgroup.json
  • 11:27 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T336886)', diff saved to https://phabricator.wikimedia.org/P49365 and previous config saved to /var/cache/conftool/dbconfig/20230609-112250-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T336886)', diff saved to https://phabricator.wikimedia.org/P49364 and previous config saved to /var/cache/conftool/dbconfig/20230609-112229-ladsgroup.json
  • 11:20 sukhe: pcc-db1001: sudo systemctl start pcc_facts_processor.service
  • 11:14 sukhe: sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P49363 and previous config saved to /var/cache/conftool/dbconfig/20230609-110723-ladsgroup.json
  • 11:02 sukhe: homer "cr*-codfw*" commit "Gerrit: 928113 add new LVS host lvs2014
  • 10:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2014.codfw.wmnet with OS bullseye
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P49362 and previous config saved to /var/cache/conftool/dbconfig/20230609-105217-ladsgroup.json
  • 10:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T336886)', diff saved to https://phabricator.wikimedia.org/P49361 and previous config saved to /var/cache/conftool/dbconfig/20230609-103711-ladsgroup.json
  • 10:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T336886)', diff saved to https://phabricator.wikimedia.org/P49360 and previous config saved to /var/cache/conftool/dbconfig/20230609-102217-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T336886)', diff saved to https://phabricator.wikimedia.org/P49359 and previous config saved to /var/cache/conftool/dbconfig/20230609-102156-ladsgroup.json
  • 10:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 10:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 10:12 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 10:09 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 10:08 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P49358 and previous config saved to /var/cache/conftool/dbconfig/20230609-100650-ladsgroup.json
  • 09:57 elukey: increase {eqiad,codfw}.change-prop.transcludes.resource-change topic partitions (3->5) on kafka main clusters - T338357
  • 09:56 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:54 moritzm: installing jupyter-core security updates on bullseye
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P49357 and previous config saved to /var/cache/conftool/dbconfig/20230609-095144-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T336886)', diff saved to https://phabricator.wikimedia.org/P49356 and previous config saved to /var/cache/conftool/dbconfig/20230609-093638-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T336886)', diff saved to https://phabricator.wikimedia.org/P49355 and previous config saved to /var/cache/conftool/dbconfig/20230609-092141-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T336886)', diff saved to https://phabricator.wikimedia.org/P49354 and previous config saved to /var/cache/conftool/dbconfig/20230609-090829-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P49353 and previous config saved to /var/cache/conftool/dbconfig/20230609-085322-ladsgroup.json
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P49352 and previous config saved to /var/cache/conftool/dbconfig/20230609-083816-ladsgroup.json
  • 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T336886)', diff saved to https://phabricator.wikimedia.org/P49351 and previous config saved to /var/cache/conftool/dbconfig/20230609-082310-ladsgroup.json
  • 08:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T336886)', diff saved to https://phabricator.wikimedia.org/P49350 and previous config saved to /var/cache/conftool/dbconfig/20230609-080708-ladsgroup.json
  • 08:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T336886)', diff saved to https://phabricator.wikimedia.org/P49349 and previous config saved to /var/cache/conftool/dbconfig/20230609-080637-ladsgroup.json
  • 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P49348 and previous config saved to /var/cache/conftool/dbconfig/20230609-075130-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P49347 and previous config saved to /var/cache/conftool/dbconfig/20230609-073624-ladsgroup.json
  • 07:33 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1492.eqiad.wmnet
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T336886)', diff saved to https://phabricator.wikimedia.org/P49346 and previous config saved to /var/cache/conftool/dbconfig/20230609-072118-ladsgroup.json
  • 07:19 moritzm: powercycling restbase2018 (kernel hung following what looks like I/O errors)
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T336886)', diff saved to https://phabricator.wikimedia.org/P49345 and previous config saved to /var/cache/conftool/dbconfig/20230609-070520-ladsgroup.json
  • 07:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T336886)', diff saved to https://phabricator.wikimedia.org/P49344 and previous config saved to /var/cache/conftool/dbconfig/20230609-070459-ladsgroup.json
  • 06:50 moritzm: installing wireshark security updates
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P49343 and previous config saved to /var/cache/conftool/dbconfig/20230609-064953-ladsgroup.json
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: puppetmaster2005.codfw.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: puppetmaster2005.codfw.wmnet
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: puppetmaster1005.eqiad.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: puppetmaster1005.eqiad.wmnet
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus3001.esams.wmnet
  • 06:48 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus3001.esams.wmnet
  • 06:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
  • 06:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P49342 and previous config saved to /var/cache/conftool/dbconfig/20230609-063447-ladsgroup.json
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T336886)', diff saved to https://phabricator.wikimedia.org/P49341 and previous config saved to /var/cache/conftool/dbconfig/20230609-061941-ladsgroup.json
  • 06:06 eileen: config 97c57848 -> 6f4a9d19 restart jobs
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T336886)', diff saved to https://phabricator.wikimedia.org/P49340 and previous config saved to /var/cache/conftool/dbconfig/20230609-060438-ladsgroup.json
  • 06:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 05:53 eileen: civicrm upgraded from 158896cc to 5bbed553
  • 05:52 eileen: config revision changed from 8b71fa7a to 97c57848
  • 05:50 moritzm: installing cpio security updates
  • 05:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T336886)', diff saved to https://phabricator.wikimedia.org/P49339 and previous config saved to /var/cache/conftool/dbconfig/20230609-052315-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P49338 and previous config saved to /var/cache/conftool/dbconfig/20230609-050809-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P49337 and previous config saved to /var/cache/conftool/dbconfig/20230609-045302-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T336886)', diff saved to https://phabricator.wikimedia.org/P49336 and previous config saved to /var/cache/conftool/dbconfig/20230609-043756-ladsgroup.json
  • 04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T336886)', diff saved to https://phabricator.wikimedia.org/P49335 and previous config saved to /var/cache/conftool/dbconfig/20230609-042306-ladsgroup.json
  • 04:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T336886)', diff saved to https://phabricator.wikimedia.org/P49334 and previous config saved to /var/cache/conftool/dbconfig/20230609-042246-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P49333 and previous config saved to /var/cache/conftool/dbconfig/20230609-040739-ladsgroup.json
  • 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P49332 and previous config saved to /var/cache/conftool/dbconfig/20230609-035233-ladsgroup.json
  • 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T336886)', diff saved to https://phabricator.wikimedia.org/P49331 and previous config saved to /var/cache/conftool/dbconfig/20230609-033727-ladsgroup.json
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T336886)', diff saved to https://phabricator.wikimedia.org/P49330 and previous config saved to /var/cache/conftool/dbconfig/20230609-032127-ladsgroup.json
  • 03:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T336886)', diff saved to https://phabricator.wikimedia.org/P49329 and previous config saved to /var/cache/conftool/dbconfig/20230609-032106-ladsgroup.json
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P49328 and previous config saved to /var/cache/conftool/dbconfig/20230609-030600-ladsgroup.json
  • 02:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P49327 and previous config saved to /var/cache/conftool/dbconfig/20230609-025054-ladsgroup.json
  • 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T336886)', diff saved to https://phabricator.wikimedia.org/P49326 and previous config saved to /var/cache/conftool/dbconfig/20230609-023548-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T336886)', diff saved to https://phabricator.wikimedia.org/P49325 and previous config saved to /var/cache/conftool/dbconfig/20230609-022054-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 02:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T336886)', diff saved to https://phabricator.wikimedia.org/P49324 and previous config saved to /var/cache/conftool/dbconfig/20230609-022034-ladsgroup.json
  • 02:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P49323 and previous config saved to /var/cache/conftool/dbconfig/20230609-020528-ladsgroup.json
  • 02:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
  • 02:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
  • 02:00 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P49322 and previous config saved to /var/cache/conftool/dbconfig/20230609-015021-ladsgroup.json
  • 01:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
  • 01:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T336886)', diff saved to https://phabricator.wikimedia.org/P49321 and previous config saved to /var/cache/conftool/dbconfig/20230609-013515-ladsgroup.json
  • 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS bullseye
  • 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T336886)', diff saved to https://phabricator.wikimedia.org/P49320 and previous config saved to /var/cache/conftool/dbconfig/20230609-011945-ladsgroup.json
  • 01:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 01:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T336886)', diff saved to https://phabricator.wikimedia.org/P49319 and previous config saved to /var/cache/conftool/dbconfig/20230609-011924-ladsgroup.json
  • 01:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P49318 and previous config saved to /var/cache/conftool/dbconfig/20230609-010418-ladsgroup.json
  • 00:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
  • 00:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
  • 00:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P49317 and previous config saved to /var/cache/conftool/dbconfig/20230609-004912-ladsgroup.json
  • 00:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
  • 00:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 00:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS bullseye
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T336886)', diff saved to https://phabricator.wikimedia.org/P49316 and previous config saved to /var/cache/conftool/dbconfig/20230609-003406-ladsgroup.json
  • 00:31 eileen: civicrm upgraded from 6f64e77d to 158896cc
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki-root1002']
  • 00:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki-root1002']
  • 00:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki-root1002']
  • 00:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki-root1002']
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T336886)', diff saved to https://phabricator.wikimedia.org/P49315 and previous config saved to /var/cache/conftool/dbconfig/20230609-001821-ladsgroup.json
  • 00:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T336886)', diff saved to https://phabricator.wikimedia.org/P49314 and previous config saved to /var/cache/conftool/dbconfig/20230609-001732-ladsgroup.json
  • 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P49313 and previous config saved to /var/cache/conftool/dbconfig/20230609-000226-ladsgroup.json

2023-06-08

  • 23:55 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
  • 23:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 23:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 23:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P49312 and previous config saved to /var/cache/conftool/dbconfig/20230608-234720-ladsgroup.json
  • 23:42 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for pki-root - pt1979@cumin2002"
  • 23:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for pki-root - pt1979@cumin2002"
  • 23:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T336886)', diff saved to https://phabricator.wikimedia.org/P49311 and previous config saved to /var/cache/conftool/dbconfig/20230608-233214-ladsgroup.json
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T336886)', diff saved to https://phabricator.wikimedia.org/P49310 and previous config saved to /var/cache/conftool/dbconfig/20230608-231650-ladsgroup.json
  • 23:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T336886)', diff saved to https://phabricator.wikimedia.org/P49309 and previous config saved to /var/cache/conftool/dbconfig/20230608-231629-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P49308 and previous config saved to /var/cache/conftool/dbconfig/20230608-230123-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P49307 and previous config saved to /var/cache/conftool/dbconfig/20230608-224617-ladsgroup.json
  • 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: decom
  • 22:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: decom
  • 22:37 mutante: gerrit1001 - rmdir /etc/ssh/userkeys/gerrit.d which leads to puppet warnings because it cant remove empty dir
  • 22:35 mutante: removing gerrit role from former gerrit prod machine gerrit1001, removes firewall rules, shell access, monitoring..etc
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T336886)', diff saved to https://phabricator.wikimedia.org/P49306 and previous config saved to /var/cache/conftool/dbconfig/20230608-223111-ladsgroup.json
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T336886)', diff saved to https://phabricator.wikimedia.org/P49305 and previous config saved to /var/cache/conftool/dbconfig/20230608-221536-ladsgroup.json
  • 22:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 22:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T336886)', diff saved to https://phabricator.wikimedia.org/P49304 and previous config saved to /var/cache/conftool/dbconfig/20230608-221515-ladsgroup.json
  • 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P49303 and previous config saved to /var/cache/conftool/dbconfig/20230608-220009-ladsgroup.json
  • 21:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P49302 and previous config saved to /var/cache/conftool/dbconfig/20230608-214503-ladsgroup.json
  • 21:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T336886)', diff saved to https://phabricator.wikimedia.org/P49301 and previous config saved to /var/cache/conftool/dbconfig/20230608-212957-ladsgroup.json
  • 21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T336886)', diff saved to https://phabricator.wikimedia.org/P49300 and previous config saved to /var/cache/conftool/dbconfig/20230608-211419-ladsgroup.json
  • 21:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 21:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 21:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 21:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 21:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 21:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 21:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 21:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T336886)', diff saved to https://phabricator.wikimedia.org/P49298 and previous config saved to /var/cache/conftool/dbconfig/20230608-204722-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P49297 and previous config saved to /var/cache/conftool/dbconfig/20230608-203216-ladsgroup.json
  • 20:31 ladsgroup@deploy1002: Finished scap: Backport for Externallinks: Make port part of the index (T337149) (duration: 10m 10s)
  • 20:22 ladsgroup@deploy1002: ladsgroup: Backport for Externallinks: Make port part of the index (T337149) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS bullseye
  • 20:20 ladsgroup@deploy1002: Started scap: Backport for Externallinks: Make port part of the index (T337149)
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P49296 and previous config saved to /var/cache/conftool/dbconfig/20230608-201710-ladsgroup.json
  • 20:12 ladsgroup@deploy1002: Finished scap: Backport for Remove VectorLimitedWidthIndicator (T336197) (duration: 07m 32s)
  • 20:06 ladsgroup@deploy1002: ladsgroup and ksarabia: Backport for Remove VectorLimitedWidthIndicator (T336197) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:05 ladsgroup@deploy1002: Started scap: Backport for Remove VectorLimitedWidthIndicator (T336197)
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T336886)', diff saved to https://phabricator.wikimedia.org/P49295 and previous config saved to /var/cache/conftool/dbconfig/20230608-200204-ladsgroup.json
  • 20:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
  • 19:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T336886)', diff saved to https://phabricator.wikimedia.org/P49294 and previous config saved to /var/cache/conftool/dbconfig/20230608-194555-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T336886)', diff saved to https://phabricator.wikimedia.org/P49293 and previous config saved to /var/cache/conftool/dbconfig/20230608-194534-ladsgroup.json
  • 19:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS bullseye
  • 19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P49292 and previous config saved to /var/cache/conftool/dbconfig/20230608-193028-ladsgroup.json
  • 19:22 jclark@cumin1001: START - Cookbook sre.hosts.provision for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P49291 and previous config saved to /var/cache/conftool/dbconfig/20230608-191522-ladsgroup.json
  • 19:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T336886)', diff saved to https://phabricator.wikimedia.org/P49290 and previous config saved to /var/cache/conftool/dbconfig/20230608-190016-ladsgroup.json
  • 18:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T336886)', diff saved to https://phabricator.wikimedia.org/P49289 and previous config saved to /var/cache/conftool/dbconfig/20230608-184312-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49288 and previous config saved to /var/cache/conftool/dbconfig/20230608-184251-ladsgroup.json
  • 18:36 jclark@cumin1001: START - Cookbook sre.hosts.provision for host backup1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:36 jclark@cumin1001: START - Cookbook sre.hosts.provision for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P49287 and previous config saved to /var/cache/conftool/dbconfig/20230608-182745-ladsgroup.json
  • 18:24 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in eqiad: maintenance
  • 18:19 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in eqiad: maintenance
  • 18:18 urandom: (Re)pooling sessionstore/eqiad — T337426
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P49286 and previous config saved to /var/cache/conftool/dbconfig/20230608-181238-ladsgroup.json
  • 18:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.12 refs T337526
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49285 and previous config saved to /var/cache/conftool/dbconfig/20230608-175732-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49284 and previous config saved to /var/cache/conftool/dbconfig/20230608-174135-ladsgroup.json
  • 17:41 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 17:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:31 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:31 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:30 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:30 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:28 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T336886)', diff saved to https://phabricator.wikimedia.org/P49283 and previous config saved to /var/cache/conftool/dbconfig/20230608-172746-ladsgroup.json
  • 17:24 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P49282 and previous config saved to /var/cache/conftool/dbconfig/20230608-171240-ladsgroup.json
  • 17:10 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 17:05 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster1006.eqiad.wmnet with OS bullseye
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P49281 and previous config saved to /var/cache/conftool/dbconfig/20230608-165734-ladsgroup.json
  • 16:56 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs
  • 16:46 urandom: Starting traffic test against sessionstore.svc.eqiad.wmnet — T337426
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster1006.eqiad.wmnet with reason: host reimage
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T336886)', diff saved to https://phabricator.wikimedia.org/P49280 and previous config saved to /var/cache/conftool/dbconfig/20230608-164228-ladsgroup.json
  • 16:41 urandom: Upgrading Cassandra to 4.1.1, sessionstore1003 — T337426
  • 16:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1006.eqiad.wmnet with reason: host reimage
  • 16:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster1006.eqiad.wmnet with OS bullseye
  • 16:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetmaster1006.eqiad.wmnet with OS bullseye
  • 16:35 urandom: Upgrading Cassandra to 4.1.1, sessionstore1002 — T337426
  • 16:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T336886)', diff saved to https://phabricator.wikimedia.org/P49279 and previous config saved to /var/cache/conftool/dbconfig/20230608-162650-ladsgroup.json
  • 16:26 urandom: Upgrading Cassandra to 4.1.1, sessionstore1001 — T337426
  • 16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:22 urandom: creating pre-upgrade Cassandra snapshots, sessionstore/eqiad — T337426
  • 16:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 16:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 16:11 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in eqiad: maintenance
  • 16:06 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2014.codfw.wmnet with OS bullseye
  • 16:06 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in eqiad: maintenance
  • 16:06 urandom: depooling eqiad sessionstore for Cassandra upgrade — T337426
  • 16:00 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 15:58 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2014.codfw.wmnet with OS bullseye
  • 15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:23 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster1006.eqiad.wmnet with OS bullseye
  • 15:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['puppetmaster1006']
  • 15:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['puppetmaster1006']
  • 15:09 moritzm: installing c-ares security updates on bullseye
  • 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:42 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 14:41 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 14:36 moritzm: installing libwep security updates on buster
  • 14:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
  • 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetmaster1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:19 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 14:17 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 14:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2014.codfw.wmnet with OS bullseye
  • 14:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:13 XioNoX: cloudsw2-c8-eqiad> request system zeroize - T338459
  • 14:13 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:09 XioNoX: decom cloudsw2-c8-eqiad - T338459
  • 14:08 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:07 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:06 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 14:02 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 14:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:00 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:59 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:58 ladsgroup@deploy1002: Finished scap: Backport for Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153) (duration: 09m 13s)
  • 13:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 13:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 13:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
  • 13:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:51 ladsgroup@deploy1002: ladsgroup: Backport for Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:49 ladsgroup@deploy1002: Started scap: Backport for Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)
  • 13:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host puppetmaster1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:48 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
  • 13:44 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:44 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 13:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:43 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 13:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:40 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:39 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 13:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 13:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:05 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:05 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:36 cmooney@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 17m 22s)
  • 12:19 topranks: De-pooling lvs1017 to move link to lsw1-e1-eqiad to ssw1-e1-eqiad T322937
  • 12:18 cmooney@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
  • 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:11 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:03 vgutierrez: restore cp4052 HAProxy configuration - T317799
  • 11:51 vgutierrez: repooling cp4052 - T317799
  • 11:40 vgutierrez: depooling cp4052 for some HAProxy tests - T317799
  • 11:28 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=nlwiki --iconv DB cluster26 (T128154)
  • 11:03 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=dawiki --iconv DB cluster27 (T128153)
  • 10:49 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=svwiki --iconv DB cluster27 (T128153)
  • 10:22 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:21 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 09:58 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@bb7526e]: (no justification provided) (duration: 00m 08s)
  • 09:57 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@bb7526e]: (no justification provided)
  • 09:40 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2001.codfw.wmnet with OS bookworm
  • 09:40 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
  • 09:24 vgutierrez: updated to HAProxy 2.7.9 on cp4052 and cp5032
  • 09:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.eqsin.wmnet,cp4052.ulsfo.wmnet} and A:cp
  • 09:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 09:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 09:17 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.eqsin.wmnet,cp4052.ulsfo.wmnet} and A:cp
  • 09:10 vgutierrez: fetch HAProxy 2.7.9 for thirdparty/haproxy27 bullseye (apt.wm.o)
  • 08:54 apergos: UTC morning backport and config training window done
  • 08:38 ariel@deploy1002: Finished scap: Backport for [ruwiki] Add an editautoreviewprotected level protecion (T337430) (duration: 08m 25s)
  • 08:31 ariel@deploy1002: ariel and superpes: Backport for [ruwiki] Add an editautoreviewprotected level protecion (T337430) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:30 ariel@deploy1002: Started scap: Backport for [ruwiki] Add an editautoreviewprotected level protecion (T337430)
  • 08:25 ariel@deploy1002: Finished scap: Backport for [fiwiki] Limitate the use of the ContentTranslation tool (T337412) (duration: 09m 16s)
  • 08:17 ariel@deploy1002: superpes and ariel: Backport for [fiwiki] Limitate the use of the ContentTranslation tool (T337412) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:16 ariel@deploy1002: Started scap: Backport for [fiwiki] Limitate the use of the ContentTranslation tool (T337412)
  • 08:12 ariel@deploy1002: Finished scap: Backport for [itwiktionary] Add a tagline (T337688) (duration: 08m 07s)
  • 08:06 ariel@deploy1002: ariel and superpes: Backport for [itwiktionary] Add a tagline (T337688) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:04 ariel@deploy1002: Started scap: Backport for [itwiktionary] Add a tagline (T337688)
  • 07:49 ariel@deploy1002: Finished scap: Backport for [kaawiki] Change the logo with an HD version and the tagline (T337641) (duration: 09m 09s)
  • 07:41 ariel@deploy1002: ariel and superpes: Backport for [kaawiki] Change the logo with an HD version and the tagline (T337641) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:40 ariel@deploy1002: Started scap: Backport for [kaawiki] Change the logo with an HD version and the tagline (T337641)
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T336886)', diff saved to https://phabricator.wikimedia.org/P49271 and previous config saved to /var/cache/conftool/dbconfig/20230608-073524-ladsgroup.json
  • 07:27 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337834) (duration: 09m 19s)
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P49270 and previous config saved to /var/cache/conftool/dbconfig/20230608-072018-ladsgroup.json
  • 07:19 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337834) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:17 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337834)
  • 07:14 elukey: delete pod kask-production-7dfdfc7cbc-2vw5q in wikikube codfw, since it was scheduled on a non dedicated node
  • 07:14 kartik@deploy1002: Finished scap: Backport for Enable Content and Section Translation for 9 Wikipedia (T337290) (duration: 09m 52s)
  • 07:06 kartik@deploy1002: kartik: Backport for Enable Content and Section Translation for 9 Wikipedia (T337290) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P49268 and previous config saved to /var/cache/conftool/dbconfig/20230608-070512-ladsgroup.json
  • 07:04 kartik@deploy1002: Started scap: Backport for Enable Content and Section Translation for 9 Wikipedia (T337290)
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T336886)', diff saved to https://phabricator.wikimedia.org/P49267 and previous config saved to /var/cache/conftool/dbconfig/20230608-065006-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T336886)', diff saved to https://phabricator.wikimedia.org/P49266 and previous config saved to /var/cache/conftool/dbconfig/20230608-064508-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 06:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 06:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T336886)', diff saved to https://phabricator.wikimedia.org/P49265 and previous config saved to /var/cache/conftool/dbconfig/20230608-064447-ladsgroup.json
  • 06:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P49264 and previous config saved to /var/cache/conftool/dbconfig/20230608-062941-ladsgroup.json
  • 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P49263 and previous config saved to /var/cache/conftool/dbconfig/20230608-061435-ladsgroup.json
  • 06:10 elukey: kill remaining processes for `andyrussg` on stat100x nodes to unblock puppet
  • 05:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T336886)', diff saved to https://phabricator.wikimedia.org/P49262 and previous config saved to /var/cache/conftool/dbconfig/20230608-055929-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T336886)', diff saved to https://phabricator.wikimedia.org/P49261 and previous config saved to /var/cache/conftool/dbconfig/20230608-055432-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T336886)', diff saved to https://phabricator.wikimedia.org/P49260 and previous config saved to /var/cache/conftool/dbconfig/20230608-055411-ladsgroup.json
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P49259 and previous config saved to /var/cache/conftool/dbconfig/20230608-053904-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P49258 and previous config saved to /var/cache/conftool/dbconfig/20230608-052358-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T336886)', diff saved to https://phabricator.wikimedia.org/P49257 and previous config saved to /var/cache/conftool/dbconfig/20230608-050852-ladsgroup.json
  • 05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T336886)', diff saved to https://phabricator.wikimedia.org/P49256 and previous config saved to /var/cache/conftool/dbconfig/20230608-050353-ladsgroup.json
  • 05:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 05:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49255 and previous config saved to /var/cache/conftool/dbconfig/20230608-050328-ladsgroup.json
  • 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P49254 and previous config saved to /var/cache/conftool/dbconfig/20230608-044821-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P49253 and previous config saved to /var/cache/conftool/dbconfig/20230608-043315-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49252 and previous config saved to /var/cache/conftool/dbconfig/20230608-041809-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49251 and previous config saved to /var/cache/conftool/dbconfig/20230608-041311-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 04:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 04:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 04:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49250 and previous config saved to /var/cache/conftool/dbconfig/20230608-040935-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P49249 and previous config saved to /var/cache/conftool/dbconfig/20230608-035428-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P49248 and previous config saved to /var/cache/conftool/dbconfig/20230608-033922-ladsgroup.json
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49247 and previous config saved to /var/cache/conftool/dbconfig/20230608-032416-ladsgroup.json
  • 03:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49246 and previous config saved to /var/cache/conftool/dbconfig/20230608-031911-ladsgroup.json
  • 03:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 03:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 03:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49245 and previous config saved to /var/cache/conftool/dbconfig/20230608-031901-ladsgroup.json
  • 03:11 eileen: civicrm upgraded from 066095b8 to 6f64e77d
  • 03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P49244 and previous config saved to /var/cache/conftool/dbconfig/20230608-030355-ladsgroup.json
  • 02:54 samtar@deploy1002: Finished scap: Backport for Remove additional v1 suffix when computing internalRestbaseURL (T334842 T338381) (duration: 09m 50s)
  • 02:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P49243 and previous config saved to /var/cache/conftool/dbconfig/20230608-024849-ladsgroup.json
  • 02:46 samtar@deploy1002: samtar: Backport for Remove additional v1 suffix when computing internalRestbaseURL (T334842 T338381) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 02:44 samtar@deploy1002: Started scap: Backport for Remove additional v1 suffix when computing internalRestbaseURL (T334842 T338381)
  • 02:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49242 and previous config saved to /var/cache/conftool/dbconfig/20230608-023343-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49241 and previous config saved to /var/cache/conftool/dbconfig/20230608-022842-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T336886)', diff saved to https://phabricator.wikimedia.org/P49240 and previous config saved to /var/cache/conftool/dbconfig/20230608-022821-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P49239 and previous config saved to /var/cache/conftool/dbconfig/20230608-021315-ladsgroup.json
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P49238 and previous config saved to /var/cache/conftool/dbconfig/20230608-015809-ladsgroup.json
  • 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T336886)', diff saved to https://phabricator.wikimedia.org/P49237 and previous config saved to /var/cache/conftool/dbconfig/20230608-014303-ladsgroup.json
  • 01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T336886)', diff saved to https://phabricator.wikimedia.org/P49236 and previous config saved to /var/cache/conftool/dbconfig/20230608-013808-ladsgroup.json
  • 01:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 01:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T336886)', diff saved to https://phabricator.wikimedia.org/P49235 and previous config saved to /var/cache/conftool/dbconfig/20230608-013736-ladsgroup.json
  • 01:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 01:23 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P49234 and previous config saved to /var/cache/conftool/dbconfig/20230608-012230-ladsgroup.json
  • 01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T336886)', diff saved to https://phabricator.wikimedia.org/P49233 and previous config saved to /var/cache/conftool/dbconfig/20230608-010853-ladsgroup.json
  • 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P49232 and previous config saved to /var/cache/conftool/dbconfig/20230608-010724-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P49231 and previous config saved to /var/cache/conftool/dbconfig/20230608-005347-ladsgroup.json
  • 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T336886)', diff saved to https://phabricator.wikimedia.org/P49230 and previous config saved to /var/cache/conftool/dbconfig/20230608-005218-ladsgroup.json
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T336886)', diff saved to https://phabricator.wikimedia.org/P49229 and previous config saved to /var/cache/conftool/dbconfig/20230608-004713-ladsgroup.json
  • 00:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 00:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T336886)', diff saved to https://phabricator.wikimedia.org/P49228 and previous config saved to /var/cache/conftool/dbconfig/20230608-004653-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P49227 and previous config saved to /var/cache/conftool/dbconfig/20230608-003841-ladsgroup.json
  • 00:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P49226 and previous config saved to /var/cache/conftool/dbconfig/20230608-003146-ladsgroup.json
  • 00:28 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 00:28 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-cluster
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T336886)', diff saved to https://phabricator.wikimedia.org/P49225 and previous config saved to /var/cache/conftool/dbconfig/20230608-002335-ladsgroup.json
  • 00:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P49224 and previous config saved to /var/cache/conftool/dbconfig/20230608-001640-ladsgroup.json
  • 00:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T336886)', diff saved to https://phabricator.wikimedia.org/P49223 and previous config saved to /var/cache/conftool/dbconfig/20230608-001555-ladsgroup.json
  • 00:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 00:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 00:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49222 and previous config saved to /var/cache/conftool/dbconfig/20230608-001534-ladsgroup.json
  • 00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T336886)', diff saved to https://phabricator.wikimedia.org/P49221 and previous config saved to /var/cache/conftool/dbconfig/20230608-000134-ladsgroup.json
  • 00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P49220 and previous config saved to /var/cache/conftool/dbconfig/20230608-000028-ladsgroup.json

2023-06-07

  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T336886)', diff saved to https://phabricator.wikimedia.org/P49219 and previous config saved to /var/cache/conftool/dbconfig/20230607-235624-ladsgroup.json
  • 23:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T336886)', diff saved to https://phabricator.wikimedia.org/P49218 and previous config saved to /var/cache/conftool/dbconfig/20230607-235603-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P49217 and previous config saved to /var/cache/conftool/dbconfig/20230607-234522-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P49216 and previous config saved to /var/cache/conftool/dbconfig/20230607-234057-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49215 and previous config saved to /var/cache/conftool/dbconfig/20230607-233016-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P49214 and previous config saved to /var/cache/conftool/dbconfig/20230607-232551-ladsgroup.json
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49213 and previous config saved to /var/cache/conftool/dbconfig/20230607-232223-ladsgroup.json
  • 23:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 23:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49212 and previous config saved to /var/cache/conftool/dbconfig/20230607-232203-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T336886)', diff saved to https://phabricator.wikimedia.org/P49211 and previous config saved to /var/cache/conftool/dbconfig/20230607-231045-ladsgroup.json
  • 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P49210 and previous config saved to /var/cache/conftool/dbconfig/20230607-230657-ladsgroup.json
  • 23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T336886)', diff saved to https://phabricator.wikimedia.org/P49209 and previous config saved to /var/cache/conftool/dbconfig/20230607-230540-ladsgroup.json
  • 23:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 23:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 23:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 22:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T336886)', diff saved to https://phabricator.wikimedia.org/P49208 and previous config saved to /var/cache/conftool/dbconfig/20230607-225926-ladsgroup.json
  • 22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P49207 and previous config saved to /var/cache/conftool/dbconfig/20230607-225150-ladsgroup.json
  • 22:45 zabe@deploy1002: Finished scap: T338287 (duration: 07m 30s)
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P49206 and previous config saved to /var/cache/conftool/dbconfig/20230607-224420-ladsgroup.json
  • 22:38 zabe@deploy1002: Started scap: T338287
  • 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49205 and previous config saved to /var/cache/conftool/dbconfig/20230607-223644-ladsgroup.json
  • 22:34 zabe@deploy1002: Sync cancelled.
  • 22:34 zabe@deploy1002: zabe: Backport for Use cuc_timestamp as index field when reading old (T338287) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:32 zabe@deploy1002: Started scap: Backport for Use cuc_timestamp as index field when reading old (T338287)
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P49204 and previous config saved to /var/cache/conftool/dbconfig/20230607-222914-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49203 and previous config saved to /var/cache/conftool/dbconfig/20230607-222905-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49202 and previous config saved to /var/cache/conftool/dbconfig/20230607-222844-ladsgroup.json
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T336886)', diff saved to https://phabricator.wikimedia.org/P49201 and previous config saved to /var/cache/conftool/dbconfig/20230607-221408-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P49200 and previous config saved to /var/cache/conftool/dbconfig/20230607-221338-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T336886)', diff saved to https://phabricator.wikimedia.org/P49199 and previous config saved to /var/cache/conftool/dbconfig/20230607-220859-ladsgroup.json
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T336886)', diff saved to https://phabricator.wikimedia.org/P49198 and previous config saved to /var/cache/conftool/dbconfig/20230607-220821-ladsgroup.json
  • 22:05 eileen: civicrm upgraded from bcc8fccc to 066095b8
  • 22:05 zabe@deploy1002: Finished scap: Backport for Use cuc_timestamp as index field when reading old (T338287) (duration: 11m 48s)
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P49197 and previous config saved to /var/cache/conftool/dbconfig/20230607-215831-ladsgroup.json
  • 21:55 zabe@deploy1002: dreamyjazz and zabe: Backport for Use cuc_timestamp as index field when reading old (T338287) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:53 zabe@deploy1002: Started scap: Backport for Use cuc_timestamp as index field when reading old (T338287)
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P49196 and previous config saved to /var/cache/conftool/dbconfig/20230607-215315-ladsgroup.json
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49195 and previous config saved to /var/cache/conftool/dbconfig/20230607-214325-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P49194 and previous config saved to /var/cache/conftool/dbconfig/20230607-213809-ladsgroup.json
  • 21:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
  • 21:36 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet
  • 21:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49193 and previous config saved to /var/cache/conftool/dbconfig/20230607-213530-ladsgroup.json
  • 21:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 21:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 21:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T336886)', diff saved to https://phabricator.wikimedia.org/P49192 and previous config saved to /var/cache/conftool/dbconfig/20230607-213509-ladsgroup.json
  • 21:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 21:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1016.eqiad.wmnet
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1016.eqiad.wmnet
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T336886)', diff saved to https://phabricator.wikimedia.org/P49191 and previous config saved to /var/cache/conftool/dbconfig/20230607-212303-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P49190 and previous config saved to /var/cache/conftool/dbconfig/20230607-212003-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T336886)', diff saved to https://phabricator.wikimedia.org/P49189 and previous config saved to /var/cache/conftool/dbconfig/20230607-211807-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 21:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T336886)', diff saved to https://phabricator.wikimedia.org/P49188 and previous config saved to /var/cache/conftool/dbconfig/20230607-211746-ladsgroup.json
  • 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P49187 and previous config saved to /var/cache/conftool/dbconfig/20230607-210457-ladsgroup.json
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P49186 and previous config saved to /var/cache/conftool/dbconfig/20230607-210240-ladsgroup.json
  • 20:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T336886)', diff saved to https://phabricator.wikimedia.org/P49185 and previous config saved to /var/cache/conftool/dbconfig/20230607-204951-ladsgroup.json
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P49184 and previous config saved to /var/cache/conftool/dbconfig/20230607-204734-ladsgroup.json
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T336886)', diff saved to https://phabricator.wikimedia.org/P49183 and previous config saved to /var/cache/conftool/dbconfig/20230607-204728-ladsgroup.json
  • 20:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T336886)', diff saved to https://phabricator.wikimedia.org/P49182 and previous config saved to /var/cache/conftool/dbconfig/20230607-204652-ladsgroup.json
  • 20:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 20:35 catrope@deploy1002: Finished scap: Backport for Link to translations of CC BY-SA 4.0 where possible (T319064) (duration: 12m 12s)
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T336886)', diff saved to https://phabricator.wikimedia.org/P49181 and previous config saved to /var/cache/conftool/dbconfig/20230607-203228-ladsgroup.json
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P49180 and previous config saved to /var/cache/conftool/dbconfig/20230607-203146-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T336886)', diff saved to https://phabricator.wikimedia.org/P49179 and previous config saved to /var/cache/conftool/dbconfig/20230607-202733-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 20:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 20:24 catrope@deploy1002: catrope: Backport for Link to translations of CC BY-SA 4.0 where possible (T319064) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T336886)', diff saved to https://phabricator.wikimedia.org/P49178 and previous config saved to /var/cache/conftool/dbconfig/20230607-202408-ladsgroup.json
  • 20:23 catrope@deploy1002: Started scap: Backport for Link to translations of CC BY-SA 4.0 where possible (T319064)
  • 20:18 catrope@deploy1002: Finished scap: Backport for Deploy GDI safety survey to JA and RU wikis. (T337728) (duration: 10m 53s)
  • 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P49177 and previous config saved to /var/cache/conftool/dbconfig/20230607-201640-ladsgroup.json
  • 20:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
  • 20:15 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
  • 20:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
  • 20:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
  • 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
  • 20:11 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet
  • 20:09 catrope@deploy1002: catrope and essexigyan: Backport for Deploy GDI safety survey to JA and RU wikis. (T337728) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P49176 and previous config saved to /var/cache/conftool/dbconfig/20230607-200902-ladsgroup.json
  • 20:07 catrope@deploy1002: Started scap: Backport for Deploy GDI safety survey to JA and RU wikis. (T337728)
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T336886)', diff saved to https://phabricator.wikimedia.org/P49175 and previous config saved to /var/cache/conftool/dbconfig/20230607-200134-ladsgroup.json
  • 19:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P49174 and previous config saved to /var/cache/conftool/dbconfig/20230607-195356-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T336886)', diff saved to https://phabricator.wikimedia.org/P49173 and previous config saved to /var/cache/conftool/dbconfig/20230607-195316-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T336886)', diff saved to https://phabricator.wikimedia.org/P49172 and previous config saved to /var/cache/conftool/dbconfig/20230607-195255-ladsgroup.json
  • 19:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 19:41 taavi: manually created 3 global accounts T338197
  • 19:40 bblack: cp*: disabling puppet temporarily out of an abundance of caution
  • 19:40 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 19:40 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T336886)', diff saved to https://phabricator.wikimedia.org/P49171 and previous config saved to /var/cache/conftool/dbconfig/20230607-193850-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P49170 and previous config saved to /var/cache/conftool/dbconfig/20230607-193749-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T336886)', diff saved to https://phabricator.wikimedia.org/P49169 and previous config saved to /var/cache/conftool/dbconfig/20230607-193357-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49168 and previous config saved to /var/cache/conftool/dbconfig/20230607-193326-ladsgroup.json
  • 19:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:23 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P49167 and previous config saved to /var/cache/conftool/dbconfig/20230607-192243-ladsgroup.json
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P49166 and previous config saved to /var/cache/conftool/dbconfig/20230607-191820-ladsgroup.json
  • 19:16 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
  • 19:11 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
  • 19:11 urandom: (Re)pooling codfw sessionstore — T337426
  • 19:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T336886)', diff saved to https://phabricator.wikimedia.org/P49165 and previous config saved to /var/cache/conftool/dbconfig/20230607-190737-ladsgroup.json
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T336886)', diff saved to https://phabricator.wikimedia.org/P49164 and previous config saved to /var/cache/conftool/dbconfig/20230607-190514-ladsgroup.json
  • 19:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P49163 and previous config saved to /var/cache/conftool/dbconfig/20230607-190314-ladsgroup.json
  • 19:02 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 18:59 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 18:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 18:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49162 and previous config saved to /var/cache/conftool/dbconfig/20230607-184808-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T336886)', diff saved to https://phabricator.wikimedia.org/P49161 and previous config saved to /var/cache/conftool/dbconfig/20230607-184712-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49160 and previous config saved to /var/cache/conftool/dbconfig/20230607-184411-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49159 and previous config saved to /var/cache/conftool/dbconfig/20230607-184351-ladsgroup.json
  • 18:41 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P49158 and previous config saved to /var/cache/conftool/dbconfig/20230607-183206-ladsgroup.json
  • 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P49157 and previous config saved to /var/cache/conftool/dbconfig/20230607-182845-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1135.eqiad.wmnet with reason: T338354
  • 18:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1135.eqiad.wmnet with reason: T338354
  • 18:22 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
  • 18:20 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.12 refs T337526 (duration: 06m 05s)
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P49156 and previous config saved to /var/cache/conftool/dbconfig/20230607-181700-ladsgroup.json
  • 18:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.12 refs T337526
  • 18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P49155 and previous config saved to /var/cache/conftool/dbconfig/20230607-181339-ladsgroup.json
  • 18:08 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@d90d5c8]: (no justification provided) (duration: 00m 33s)
  • 18:07 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@d90d5c8]: (no justification provided)
  • 18:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2014.codfw.wmnet with OS bullseye
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T336886)', diff saved to https://phabricator.wikimedia.org/P49154 and previous config saved to /var/cache/conftool/dbconfig/20230607-180154-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49153 and previous config saved to /var/cache/conftool/dbconfig/20230607-175833-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T336886)', diff saved to https://phabricator.wikimedia.org/P49152 and previous config saved to /var/cache/conftool/dbconfig/20230607-175347-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49151 and previous config saved to /var/cache/conftool/dbconfig/20230607-175337-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T336886)', diff saved to https://phabricator.wikimedia.org/P49150 and previous config saved to /var/cache/conftool/dbconfig/20230607-175327-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49149 and previous config saved to /var/cache/conftool/dbconfig/20230607-175316-ladsgroup.json
  • 17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=ats-be
  • 17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=cdn
  • 17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=ats-be
  • 17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=cdn
  • 17:46 inflatador: bking@wdqs depool wdqs2012 T321605
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P49148 and previous config saved to /var/cache/conftool/dbconfig/20230607-173821-ladsgroup.json
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P49147 and previous config saved to /var/cache/conftool/dbconfig/20230607-173810-ladsgroup.json
  • 17:34 cwhite@cumin2002: dbctl commit (dc=all): 'depool db1135', diff saved to https://phabricator.wikimedia.org/P49146 and previous config saved to /var/cache/conftool/dbconfig/20230607-173453-cwhite.json
  • 17:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
  • 17:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P49145 and previous config saved to /var/cache/conftool/dbconfig/20230607-172315-ladsgroup.json
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P49144 and previous config saved to /var/cache/conftool/dbconfig/20230607-172304-ladsgroup.json
  • 17:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T336886)', diff saved to https://phabricator.wikimedia.org/P49143 and previous config saved to /var/cache/conftool/dbconfig/20230607-170808-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49142 and previous config saved to /var/cache/conftool/dbconfig/20230607-170758-ladsgroup.json
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T336886)', diff saved to https://phabricator.wikimedia.org/P49141 and previous config saved to /var/cache/conftool/dbconfig/20230607-170551-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T336886)', diff saved to https://phabricator.wikimedia.org/P49140 and previous config saved to /var/cache/conftool/dbconfig/20230607-170530-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49139 and previous config saved to /var/cache/conftool/dbconfig/20230607-170252-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49138 and previous config saved to /var/cache/conftool/dbconfig/20230607-165934-ladsgroup.json
  • 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P49137 and previous config saved to /var/cache/conftool/dbconfig/20230607-165024-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P49135 and previous config saved to /var/cache/conftool/dbconfig/20230607-164428-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P49134 and previous config saved to /var/cache/conftool/dbconfig/20230607-163518-ladsgroup.json
  • 16:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P49133 and previous config saved to /var/cache/conftool/dbconfig/20230607-162922-ladsgroup.json
  • 16:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2014']
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T336886)', diff saved to https://phabricator.wikimedia.org/P49132 and previous config saved to /var/cache/conftool/dbconfig/20230607-162012-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T336886)', diff saved to https://phabricator.wikimedia.org/P49131 and previous config saved to /var/cache/conftool/dbconfig/20230607-161800-ladsgroup.json
  • 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T336886)', diff saved to https://phabricator.wikimedia.org/P49130 and previous config saved to /var/cache/conftool/dbconfig/20230607-161740-ladsgroup.json
  • 16:15 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49129 and previous config saved to /var/cache/conftool/dbconfig/20230607-161416-ladsgroup.json
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:12 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49128 and previous config saved to /var/cache/conftool/dbconfig/20230607-160912-ladsgroup.json
  • 16:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T336886)', diff saved to https://phabricator.wikimedia.org/P49127 and previous config saved to /var/cache/conftool/dbconfig/20230607-160851-ladsgroup.json
  • 16:07 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:04 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
  • 16:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P49126 and previous config saved to /var/cache/conftool/dbconfig/20230607-160234-ladsgroup.json
  • 16:00 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lists1003.wikimedia.org
  • 15:57 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:56 urandom: Beginning (3 hour) generated traffic testing of sessionstore.svc.codfw.wmnet — T337426
  • 15:56 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P49125 and previous config saved to /var/cache/conftool/dbconfig/20230607-155345-ladsgroup.json
  • 15:52 urandom: Upgrading Cassandra to 4.1.1, sessionstore2003 — T337426
  • 15:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1003.wikimedia.org
  • 15:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P49124 and previous config saved to /var/cache/conftool/dbconfig/20230607-154727-ladsgroup.json
  • 15:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 15:44 urandom: Upgrading Cassandra to 4.1.1, sessionstore2002 — T337426
  • 15:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for lvs2014 - pt1979@cumin2002"
  • 15:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for lvs2014 - pt1979@cumin2002"
  • 15:40 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2001.codfw.wmnet with reason: host reimage
  • 15:39 moritzm: installing isc-dhcp bugfixes updates from Bullseye 11.7 point release
  • 15:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P49123 and previous config saved to /var/cache/conftool/dbconfig/20230607-153839-ladsgroup.json
  • 15:37 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2001.codfw.wmnet with reason: host reimage
  • 15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:33 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 15:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T336886)', diff saved to https://phabricator.wikimedia.org/P49122 and previous config saved to /var/cache/conftool/dbconfig/20230607-153221-ladsgroup.json
  • 15:26 moritzm: rolling restart of FPM on mw canaries to pick up libwebp security updates
  • 15:26 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T336886)', diff saved to https://phabricator.wikimedia.org/P49121 and previous config saved to /var/cache/conftool/dbconfig/20230607-152456-ladsgroup.json
  • 15:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49120 and previous config saved to /var/cache/conftool/dbconfig/20230607-152425-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T336886)', diff saved to https://phabricator.wikimedia.org/P49119 and previous config saved to /var/cache/conftool/dbconfig/20230607-152333-ladsgroup.json
  • 15:23 elukey: all varnishkafka instances on caching nodes are getting restarted due to https://gerrit.wikimedia.org/r/c/operations/puppet/+/928087 - T337825
  • 15:22 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:22 elukey: re-enable puppet on caching nodes
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:21 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:21 claime: Bumping prewarmparsoid concurrency to 45 in changeprop-jobqueue - T320534
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T336886)', diff saved to https://phabricator.wikimedia.org/P49118 and previous config saved to /var/cache/conftool/dbconfig/20230607-151835-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T336886)', diff saved to https://phabricator.wikimedia.org/P49117 and previous config saved to /var/cache/conftool/dbconfig/20230607-151815-ladsgroup.json
  • 15:17 moritzm: installing libwebp security updates on buster
  • 15:17 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2001.codfw.wmnet with OS bookworm
  • 15:17 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver2001.codfw.wmnet with OS bookworm
  • 15:14 urandom: Upgrading Cassandra to 4.1.1, sessionstore2001 — T337426
  • 15:14 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:10 elukey: disable puppet on all caching nodes to rollout a varnishakfka change (ref: https://gerrit.wikimedia.org/r/c/operations/puppet/+/928087)
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P49116 and previous config saved to /var/cache/conftool/dbconfig/20230607-150919-ladsgroup.json
  • 15:08 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2001.codfw.wmnet with OS bookworm
  • 15:07 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: maintenance
  • 15:06 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver2001.mgmt.codfw.wmnet on all recursors
  • 15:06 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetserver2001.mgmt.codfw.wmnet on all recursors
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P49115 and previous config saved to /var/cache/conftool/dbconfig/20230607-150309-ladsgroup.json
  • 15:02 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 15:02 urandom: de-pooling sessionstore/codfw — T337426
  • 14:56 sukhe: homer "cr*-codfw*" commit "Gerrit: 928068 remove decommissioned host lvs2010"
  • 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 14:54 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P49114 and previous config saved to /var/cache/conftool/dbconfig/20230607-145413-ladsgroup.json
  • 14:54 moritzm: installing postgresql 13 security updates (clients/libs, server instances all updated already)
  • 14:53 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 14:51 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 14:49 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2010.codfw.wmnet
  • 14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P49112 and previous config saved to /var/cache/conftool/dbconfig/20230607-144803-ladsgroup.json
  • 14:43 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 14:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1001.eqiad.wmnet with reason: host reimage
  • 14:40 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_eqiad and A:cp
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49111 and previous config saved to /var/cache/conftool/dbconfig/20230607-143907-ladsgroup.json
  • 14:39 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2010.codfw.wmnet
  • 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1001.eqiad.wmnet with reason: host reimage
  • 14:36 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:33 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_eqiad and A:cp
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T336886)', diff saved to https://phabricator.wikimedia.org/P49110 and previous config saved to /var/cache/conftool/dbconfig/20230607-143256-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49109 and previous config saved to /var/cache/conftool/dbconfig/20230607-143235-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T336886)', diff saved to https://phabricator.wikimedia.org/P49108 and previous config saved to /var/cache/conftool/dbconfig/20230607-143215-ladsgroup.json
  • 14:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T336886)', diff saved to https://phabricator.wikimedia.org/P49107 and previous config saved to /var/cache/conftool/dbconfig/20230607-142756-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T336886)', diff saved to https://phabricator.wikimedia.org/P49106 and previous config saved to /var/cache/conftool/dbconfig/20230607-142736-ladsgroup.json
  • 14:26 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:25 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 14:24 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 14:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P49104 and previous config saved to /var/cache/conftool/dbconfig/20230607-141709-ladsgroup.json
  • 14:17 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet2006-dev
  • 14:16 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2006-dev
  • 14:14 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet2005-dev
  • 14:14 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2005-dev
  • 14:14 aborrero@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudnet2006-dev
  • 14:13 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2006-dev
  • 14:13 aborrero@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudnet2005-dev
  • 14:13 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2005-dev
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P49103 and previous config saved to /var/cache/conftool/dbconfig/20230607-141230-ladsgroup.json
  • 14:10 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable 'multi-line' mode in preg_match() for wikitextToHTML regex (T338264) (duration: 09m 16s)
  • 14:05 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 14:03 lucaswerkmeister-wmde@deploy1002: d3r1ck01 and lucaswerkmeister-wmde: Backport for Enable 'multi-line' mode in preg_match() for wikitextToHTML regex (T338264) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P49102 and previous config saved to /var/cache/conftool/dbconfig/20230607-140203-ladsgroup.json
  • 14:01 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable 'multi-line' mode in preg_match() for wikitextToHTML regex (T338264)
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P49101 and previous config saved to /var/cache/conftool/dbconfig/20230607-135724-ladsgroup.json
  • 13:47 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable cache warming jobs for parsoid per default. (T329366) (duration: 10m 27s)
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T336886)', diff saved to https://phabricator.wikimedia.org/P49100 and previous config saved to /var/cache/conftool/dbconfig/20230607-134656-ladsgroup.json
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T336886)', diff saved to https://phabricator.wikimedia.org/P49099 and previous config saved to /var/cache/conftool/dbconfig/20230607-134218-ladsgroup.json
  • 13:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
  • 13:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T336886)', diff saved to https://phabricator.wikimedia.org/P49098 and previous config saved to /var/cache/conftool/dbconfig/20230607-133933-ladsgroup.json
  • 13:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
  • 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49097 and previous config saved to /var/cache/conftool/dbconfig/20230607-133854-ladsgroup.json
  • 13:38 lucaswerkmeister-wmde@deploy1002: daniel and lucaswerkmeister-wmde: Backport for Enable cache warming jobs for parsoid per default. (T329366) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T336886)', diff saved to https://phabricator.wikimedia.org/P49096 and previous config saved to /var/cache/conftool/dbconfig/20230607-133725-ladsgroup.json
  • 13:37 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable cache warming jobs for parsoid per default. (T329366)
  • 13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49095 and previous config saved to /var/cache/conftool/dbconfig/20230607-133704-ladsgroup.json
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 13:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 13:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P49093 and previous config saved to /var/cache/conftool/dbconfig/20230607-132348-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49092 and previous config saved to /var/cache/conftool/dbconfig/20230607-132158-ladsgroup.json
  • 13:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:20 topranks: removing remote vlan configuration from lsw1-f1-eqiad T322937
  • 13:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:10 ladsgroup@deploy1002: Finished scap: Backport for Revert "Revert "Remove legacy encoding option from dawiktionary"" (duration: 07m 11s)
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P49090 and previous config saved to /var/cache/conftool/dbconfig/20230607-130841-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49089 and previous config saved to /var/cache/conftool/dbconfig/20230607-130651-ladsgroup.json
  • 13:04 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Revert "Remove legacy encoding option from dawiktionary"" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 13:03 ladsgroup@deploy1002: Started scap: Backport for Revert "Revert "Remove legacy encoding option from dawiktionary""
  • 13:02 cmooney@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 11m 45s)
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49088 and previous config saved to /var/cache/conftool/dbconfig/20230607-125335-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49087 and previous config saved to /var/cache/conftool/dbconfig/20230607-125145-ladsgroup.json
  • 12:51 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 12:50 topranks: Depooling lvs1019 to move link from lsw1-f1-eqiad to ssw1-f1-eqiad
  • 12:50 cmooney@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
  • 12:46 Amir1: mwscript maintenance/storage/moveToExternal.php --iconv DB cluster27 on dawiktionary and svwiktionary (T128155 and T128156)
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49086 and previous config saved to /var/cache/conftool/dbconfig/20230607-124543-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T336886)', diff saved to https://phabricator.wikimedia.org/P49085 and previous config saved to /var/cache/conftool/dbconfig/20230607-123926-ladsgroup.json
  • 12:37 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:37 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet - aborrero@cumin2002"
  • 12:36 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet - aborrero@cumin2002"
  • 12:33 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T336886)', diff saved to https://phabricator.wikimedia.org/P49084 and previous config saved to /var/cache/conftool/dbconfig/20230607-123002-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P49083 and previous config saved to /var/cache/conftool/dbconfig/20230607-122420-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P49082 and previous config saved to /var/cache/conftool/dbconfig/20230607-121456-ladsgroup.json
  • 12:13 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 12:12 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver1001.eqiad.wmnet on all recursors
  • 12:12 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetserver1001.eqiad.wmnet on all recursors
  • 12:11 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver.eqiad.wmnet on all recursors
  • 12:11 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetserver.eqiad.wmnet on all recursors
  • 12:11 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:10 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
  • 12:09 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P49081 and previous config saved to /var/cache/conftool/dbconfig/20230607-120914-ladsgroup.json
  • 12:07 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:07 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver1001
  • 12:06 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1001
  • 12:06 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2001
  • 12:04 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2001
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P49080 and previous config saved to /var/cache/conftool/dbconfig/20230607-115950-ladsgroup.json
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T336886)', diff saved to https://phabricator.wikimedia.org/P49079 and previous config saved to /var/cache/conftool/dbconfig/20230607-115408-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T336886)', diff saved to https://phabricator.wikimedia.org/P49078 and previous config saved to /var/cache/conftool/dbconfig/20230607-115156-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49077 and previous config saved to /var/cache/conftool/dbconfig/20230607-115124-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 11:48 jbond@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host puppetserver2001
  • 11:46 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2001
  • 11:46 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
  • 11:45 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T336886)', diff saved to https://phabricator.wikimedia.org/P49076 and previous config saved to /var/cache/conftool/dbconfig/20230607-114444-ladsgroup.json
  • 11:44 jbond@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host puppetserver1001
  • 11:43 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1001
  • 11:43 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T336886)', diff saved to https://phabricator.wikimedia.org/P49075 and previous config saved to /var/cache/conftool/dbconfig/20230607-114120-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49074 and previous config saved to /var/cache/conftool/dbconfig/20230607-114059-ladsgroup.json
  • 11:40 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 11:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:30 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster2005
  • 11:30 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster1005
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1005 decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 11:29 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 11:27 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1005 decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P49073 and previous config saved to /var/cache/conftool/dbconfig/20230607-112553-ladsgroup.json
  • 11:24 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 11:24 jbond@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster2005
  • 11:23 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts puppetmaster1005
  • 11:22 jbond@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster1005
  • 11:17 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetmaster1005
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P49072 and previous config saved to /var/cache/conftool/dbconfig/20230607-111047-ladsgroup.json
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49071 and previous config saved to /var/cache/conftool/dbconfig/20230607-105541-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49070 and previous config saved to /var/cache/conftool/dbconfig/20230607-105215-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49069 and previous config saved to /var/cache/conftool/dbconfig/20230607-105154-ladsgroup.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P49068 and previous config saved to /var/cache/conftool/dbconfig/20230607-103648-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P49066 and previous config saved to /var/cache/conftool/dbconfig/20230607-102141-ladsgroup.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49065 and previous config saved to /var/cache/conftool/dbconfig/20230607-100635-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49064 and previous config saved to /var/cache/conftool/dbconfig/20230607-100307-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T336886)', diff saved to https://phabricator.wikimedia.org/P49063 and previous config saved to /var/cache/conftool/dbconfig/20230607-100247-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P49062 and previous config saved to /var/cache/conftool/dbconfig/20230607-094740-ladsgroup.json
  • 09:33 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P49061 and previous config saved to /var/cache/conftool/dbconfig/20230607-093234-ladsgroup.json
  • 09:21 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T336886)', diff saved to https://phabricator.wikimedia.org/P49060 and previous config saved to /var/cache/conftool/dbconfig/20230607-091728-ladsgroup.json
  • 09:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T336886)', diff saved to https://phabricator.wikimedia.org/P49059 and previous config saved to /var/cache/conftool/dbconfig/20230607-091402-ladsgroup.json
  • 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T336886)', diff saved to https://phabricator.wikimedia.org/P49058 and previous config saved to /var/cache/conftool/dbconfig/20230607-091341-ladsgroup.json
  • 09:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 09:06 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 09:00 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 08:59 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_eqiad and A:cp
  • 08:59 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_eqiad and A:cp
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P49057 and previous config saved to /var/cache/conftool/dbconfig/20230607-085835-ladsgroup.json
  • 08:49 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P49056 and previous config saved to /var/cache/conftool/dbconfig/20230607-084329-ladsgroup.json
  • 08:34 fabfur: disable puppet on A:cp-eqiad for varnish <-> haproxy port 80 swap
  • 08:29 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T336886)', diff saved to https://phabricator.wikimedia.org/P49055 and previous config saved to /var/cache/conftool/dbconfig/20230607-082823-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T336886)', diff saved to https://phabricator.wikimedia.org/P49054 and previous config saved to /var/cache/conftool/dbconfig/20230607-082500-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T336886)', diff saved to https://phabricator.wikimedia.org/P49053 and previous config saved to /var/cache/conftool/dbconfig/20230607-082434-ladsgroup.json
  • 08:22 moritzm: uploaded ruby 2.5.5-3+deb10u5+wmf1 to apt.wikimedia.org, unbreaking Puppet runs after latest Ruby update for Buster T338294
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P49052 and previous config saved to /var/cache/conftool/dbconfig/20230607-080928-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P49051 and previous config saved to /var/cache/conftool/dbconfig/20230607-075422-ladsgroup.json
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T336886)', diff saved to https://phabricator.wikimedia.org/P49050 and previous config saved to /var/cache/conftool/dbconfig/20230607-073916-ladsgroup.json
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T336886)', diff saved to https://phabricator.wikimedia.org/P49049 and previous config saved to /var/cache/conftool/dbconfig/20230607-073554-ladsgroup.json
  • 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 07:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T336886)', diff saved to https://phabricator.wikimedia.org/P49048 and previous config saved to /var/cache/conftool/dbconfig/20230607-073533-ladsgroup.json
  • 07:22 kartik@deploy1002: Finished scap: Backport for Use direct Parsoid in Small and Medium Wikis for Content Translation (T337922) (duration: 18m 06s)
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P49047 and previous config saved to /var/cache/conftool/dbconfig/20230607-072027-ladsgroup.json
  • 07:06 kartik@deploy1002: kartik: Backport for Use direct Parsoid in Small and Medium Wikis for Content Translation (T337922) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P49046 and previous config saved to /var/cache/conftool/dbconfig/20230607-070521-ladsgroup.json
  • 07:04 kartik@deploy1002: Started scap: Backport for Use direct Parsoid in Small and Medium Wikis for Content Translation (T337922)
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T336886)', diff saved to https://phabricator.wikimedia.org/P49045 and previous config saved to /var/cache/conftool/dbconfig/20230607-065015-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T336886)', diff saved to https://phabricator.wikimedia.org/P49044 and previous config saved to /var/cache/conftool/dbconfig/20230607-064652-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 06:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T336886)', diff saved to https://phabricator.wikimedia.org/P49043 and previous config saved to /var/cache/conftool/dbconfig/20230607-064631-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T336886)', diff saved to https://phabricator.wikimedia.org/P49042 and previous config saved to /var/cache/conftool/dbconfig/20230607-064215-ladsgroup.json
  • 06:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P49041 and previous config saved to /var/cache/conftool/dbconfig/20230607-063125-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49040 and previous config saved to /var/cache/conftool/dbconfig/20230607-062709-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P49039 and previous config saved to /var/cache/conftool/dbconfig/20230607-061618-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49038 and previous config saved to /var/cache/conftool/dbconfig/20230607-061203-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T336886)', diff saved to https://phabricator.wikimedia.org/P49037 and previous config saved to /var/cache/conftool/dbconfig/20230607-060112-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T336886)', diff saved to https://phabricator.wikimedia.org/P49036 and previous config saved to /var/cache/conftool/dbconfig/20230607-055746-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T336886)', diff saved to https://phabricator.wikimedia.org/P49035 and previous config saved to /var/cache/conftool/dbconfig/20230607-055726-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T336886)', diff saved to https://phabricator.wikimedia.org/P49034 and previous config saved to /var/cache/conftool/dbconfig/20230607-055655-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T336886)', diff saved to https://phabricator.wikimedia.org/P49033 and previous config saved to /var/cache/conftool/dbconfig/20230607-055320-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49032 and previous config saved to /var/cache/conftool/dbconfig/20230607-055259-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P49031 and previous config saved to /var/cache/conftool/dbconfig/20230607-054220-ladsgroup.json
  • 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49030 and previous config saved to /var/cache/conftool/dbconfig/20230607-053753-ladsgroup.json
  • 05:28 kart_: Updated cxserver to 2023-06-07-044025-production (T337290, T337669, T337834)
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P49029 and previous config saved to /var/cache/conftool/dbconfig/20230607-052713-ladsgroup.json
  • 05:25 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:22 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49028 and previous config saved to /var/cache/conftool/dbconfig/20230607-052247-ladsgroup.json
  • 05:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:17 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:17 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T336886)', diff saved to https://phabricator.wikimedia.org/P49027 and previous config saved to /var/cache/conftool/dbconfig/20230607-051207-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T336886)', diff saved to https://phabricator.wikimedia.org/P49026 and previous config saved to /var/cache/conftool/dbconfig/20230607-050844-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 05:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T336886)', diff saved to https://phabricator.wikimedia.org/P49025 and previous config saved to /var/cache/conftool/dbconfig/20230607-050823-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49024 and previous config saved to /var/cache/conftool/dbconfig/20230607-050740-ladsgroup.json
  • 05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49023 and previous config saved to /var/cache/conftool/dbconfig/20230607-050258-ladsgroup.json
  • 05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:02 kart_: Updated MinT to 2023-06-06-120533-production (T337910, T337686, T337708)
  • 05:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T336886)', diff saved to https://phabricator.wikimedia.org/P49022 and previous config saved to /var/cache/conftool/dbconfig/20230607-050237-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to