You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance)
imported>Stashbot
(TimStarling: multi-DC stage 3: 2% of codfw/ulsfo/eqsin traffic going to codfw appservers, rolling out via puppet 00:54-01:24)
Line 1: Line 1:
== 2022-09-06 ==
* 01:03 TimStarling: multi-DC stage 3: 2% of codfw/ulsfo/eqsin traffic going to codfw appservers, rolling out via puppet 00:54-01:24
* 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
== 2022-09-05 ==
== 2022-09-05 ==
* 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33811 and previous config saved to /var/cache/conftool/dbconfig/20220905-232237-ladsgroup.json
* 23:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 23:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33810 and previous config saved to /var/cache/conftool/dbconfig/20220905-232216-ladsgroup.json
* 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33809 and previous config saved to /var/cache/conftool/dbconfig/20220905-230709-ladsgroup.json
* 22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33808 and previous config saved to /var/cache/conftool/dbconfig/20220905-225203-ladsgroup.json
* 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33807 and previous config saved to /var/cache/conftool/dbconfig/20220905-223657-ladsgroup.json
* 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33806 and previous config saved to /var/cache/conftool/dbconfig/20220905-212415-ladsgroup.json
* 21:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 21:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33805 and previous config saved to /var/cache/conftool/dbconfig/20220905-212343-ladsgroup.json
* 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33804 and previous config saved to /var/cache/conftool/dbconfig/20220905-210837-ladsgroup.json
* 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33803 and previous config saved to /var/cache/conftool/dbconfig/20220905-205330-ladsgroup.json
* 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33802 and previous config saved to /var/cache/conftool/dbconfig/20220905-203824-ladsgroup.json
* 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33801 and previous config saved to /var/cache/conftool/dbconfig/20220905-192554-ladsgroup.json
* 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33800 and previous config saved to /var/cache/conftool/dbconfig/20220905-191532-ladsgroup.json
* 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33799 and previous config saved to /var/cache/conftool/dbconfig/20220905-190027-ladsgroup.json
* 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33798 and previous config saved to /var/cache/conftool/dbconfig/20220905-184522-ladsgroup.json
* 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33797 and previous config saved to /var/cache/conftool/dbconfig/20220905-183017-ladsgroup.json
* 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 18:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33796 and previous config saved to /var/cache/conftool/dbconfig/20220905-182510-ladsgroup.json
* 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33795 and previous config saved to /var/cache/conftool/dbconfig/20220905-181003-ladsgroup.json
* 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33794 and previous config saved to /var/cache/conftool/dbconfig/20220905-175457-ladsgroup.json
* 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33793 and previous config saved to /var/cache/conftool/dbconfig/20220905-175423-ladsgroup.json
* 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
* 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
* 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33792 and previous config saved to /var/cache/conftool/dbconfig/20220905-173951-ladsgroup.json
* 16:27 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 16:26 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 15:30 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1038.eqiad.wmnet
* 15:30 moritzm: installing apache2 security updates
* 15:28 claime: depooled wtp1040.eqiad.wmnet from parsoid cluster [[phab:T307219|T307219]]
* 15:19 claime: pooled parse1007.eqiad.wmnet (php 7.4 only) in parsoid cluster [[phab:T307219|T307219]]
* 15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1007,parse1007.mgmt
* 15:16 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1007,parse1007.mgmt
* 15:09 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1007.eqiad.wmnet
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33791 and previous config saved to /var/cache/conftool/dbconfig/20220905-150837-ladsgroup.json
* 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33790 and previous config saved to /var/cache/conftool/dbconfig/20220905-150758-ladsgroup.json
* 15:04 moritzm: updating docker.io on gitlab-runners
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33789 and previous config saved to /var/cache/conftool/dbconfig/20220905-145252-ladsgroup.json
* 14:48 claime: Set wtp103[6-7].eqiad.wmnet inactive pending decommission [[phab:T317025|T317025]]
* 14:47 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1037.eqiad.wmnet
* 14:46 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1036.eqiad.wmnet
* 14:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1036-1038].eqiad.wmnet with reason: Downtiming replace wtp servers
* 14:40 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1036-1038].eqiad.wmnet with reason: Downtiming replace wtp servers
* 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33788 and previous config saved to /var/cache/conftool/dbconfig/20220905-143746-ladsgroup.json
* 14:33 claime: depooled wtp1039.eqiad.wmnet from parsoid cluster [[phab:T307219|T307219]]
* 14:30 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 14:30 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 14:29 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 14:29 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 14:28 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 14:28 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 14:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 14:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 14:23 claime: pooled parse1006.eqiad.wmnet (php 7.4 only) in parsoid cluster [[phab:T307219|T307219]]
* 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33786 and previous config saved to /var/cache/conftool/dbconfig/20220905-142240-ladsgroup.json
* 14:21 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1006,parse1006.mgmt
* 14:21 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1006,parse1006.mgmt
* 14:11 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1006.eqiad.wmnet
* 14:02 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 14:02 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 14:01 claime: depooled wtp1038.eqiad.wmnet from parsoid cluster [[phab:T307219|T307219]]
* 13:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 13:48 claime: pooled parse1005.eqiad.wmnet (php 7.4 only) in parsoid cluster [[phab:T307219|T307219]]
* 13:41 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 13:31 addshore: wdqs1009 sudo systemctl stop wdqs-blazegraph.service
* 13:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1011.eqiad.wmnet with OS bullseye
* 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on puppetdb2002.codfw.wmnet with reason: Temporarily stop puppetdb
* 13:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on puppetdb2002.codfw.wmnet with reason: Temporarily stop puppetdb
* 13:10 urbanecm: UTC afternoon B&C window done
* 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33785 and previous config saved to /var/cache/conftool/dbconfig/20220905-130944-ladsgroup.json
* 13:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 13:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|edbcee4d9a901ce475ebcc53e4c4bc18e04bc2b8}}: Enable partial action blocks on fawiki ([[phab:T315525|T315525]]) (duration: 03m 34s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:07 moritzm: disabling puppet in codfw and the edges temporarily
* 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
* 13:01 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
* 12:48 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
* 12:47 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1007.eqiad.wmnet with OS bullseye
* 12:33 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host datahubsearch1003.eqiad.wmnet
* 12:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1005,parse1005.mgmt
* 12:31 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1005,parse1005.mgmt
* 12:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
* 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
* 12:20 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 18 hosts with reason: Downtime pending inclusion in production
* 12:20 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 18 hosts with reason: Downtime pending inclusion in production
* 12:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
* 12:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
* 12:16 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1005.eqiad.wmnet
* 12:14 claime: depooled wtp1037.eqiad.wmnet from parsoid cluster [[phab:T312638|T312638]]
* 12:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
* 12:10 tstarling@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db[2142-2144].codfw.wmnet
* 12:10 tstarling@cumin1001: START - Cookbook sre.hosts.remove-downtime for db[2142-2144].codfw.wmnet
* 12:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1004.mgmt
* 12:10 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1004.mgmt
* 12:10 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
* 12:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
* 11:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[1001-1004].eqiad.wmnet
* 11:56 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[1001-1004].eqiad.wmnet
* 11:55 TimStarling: on db2142: rejecting inbound mysql traffic [[phab:T316847|T316847]]
* 11:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
* 11:53 claime: pooled parse1004.eqiad.wmnet (php 7.4 only) in parsoid cluster [[phab:T312638|T312638]]
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1004.eqiad.wmnet
* 11:52 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1004.eqiad.wmnet
* 11:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
* 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P33784 and previous config saved to /var/cache/conftool/dbconfig/20220905-114352-ladsgroup.json
* 11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-eqiad:xe-4/1/3
* 11:41 jnuche@deploy1002: Installation of scap version "4.16.0" completed for 584 hosts
* 11:41 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-eqiad:xe-4/1/3
* 11:40 jnuche@deploy1002: Installing scap version "4.16.0" for 584 hosts
* 11:37 TimStarling: on db2142: dropping inbound mysql traffic [[phab:T316847|T316847]]
* 11:36 claime: Set wtp103[4-5].eqiad.wmnet inactive pending decommission https://phabricator.wikimedia.org/T317025
* 11:34 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1035.eqiad.wmnet
* 11:34 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1034.eqiad.wmnet
* 11:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1034-1036].eqiad.wmnet with reason: Downtiming replaced wtp servers
* 11:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1034-1036].eqiad.wmnet with reason: Downtiming replaced wtp servers
* 11:30 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1004.eqiad.wmnet
* 11:29 TimStarling: on db2142: set master_delay=30 and restarted replication [[phab:T316847|T316847]]
* 11:27 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1003.eqiad.wmnet
* 11:27 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1003.eqiad.wmnet
* 11:24 claime: depooled wtp1036.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
* 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33783 and previous config saved to /var/cache/conftool/dbconfig/20220905-112308-ladsgroup.json
* 11:18 TimStarling: on db2142: stopped mariadb replication
* 11:16 claime: pooled parse1003.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
* 11:16 tstarling@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2142-2144].codfw.wmnet with reason: [[phab:T316847|T316847]] x2 failure test
* 11:15 tstarling@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2142-2144].codfw.wmnet with reason: [[phab:T316847|T316847]] x2 failure test
* 11:15 cgoubert@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet
* 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P33782 and previous config saved to /var/cache/conftool/dbconfig/20220905-110801-ladsgroup.json
* 11:04 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet
* 10:55 Emperor: set thanos ring replicas to 3.90 [[phab:T311690|T311690]]
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P33781 and previous config saved to /var/cache/conftool/dbconfig/20220905-105255-ladsgroup.json
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33780 and previous config saved to /var/cache/conftool/dbconfig/20220905-103749-ladsgroup.json
* 10:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1015.eqiad.wmnet
* 10:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 10:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1015.eqiad.wmnet
* 10:25 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 10:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1014.eqiad.wmnet
* 10:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1014.eqiad.wmnet
* 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1013.eqiad.wmnet
* 10:13 XioNoX: upgrade python-pynetbox to 6.6 on netbox frontends - [[phab:T310745|T310745]]
* 10:11 hnowlan@deploy1002: Finished deploy [restbase/deploy@79b3cd2]: Add guwwiktionary and bjnwiktionary [[phab:T309058|T309058]] [[phab:T312216|T312216]] (duration: 15m 05s)
* 10:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1013.eqiad.wmnet
* 09:56 hnowlan@deploy1002: Started deploy [restbase/deploy@79b3cd2]: Add guwwiktionary and bjnwiktionary [[phab:T309058|T309058]] [[phab:T312216|T312216]]
* 09:47 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 09:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
* 09:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1012.eqiad.wmnet
* 09:37 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 09:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
* 09:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 09:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1012.eqiad.wmnet
* 09:25 btullis: deployed calico to dse-k8s cluster [[phab:T310174|T310174]]
* 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33779 and previous config saved to /var/cache/conftool/dbconfig/20220905-092338-ladsgroup.json
* 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
* 09:23 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
* 09:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1010.eqiad.wmnet
* 09:17 XioNoX: Squid: permit production networks instead of aggregate_networks - [[phab:T265864|T265864]]
* 09:17 moritzm: installing flac security updates
* 09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1010.eqiad.wmnet
* 09:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1008.eqiad.wmnet
* 09:05 hnowlan@deploy1002: Finished deploy [restbase/deploy@a571f9a]: Add pcmwiki [[phab:T310880|T310880]] (duration: 01m 06s)
* 09:04 hnowlan@deploy1002: Started deploy [restbase/deploy@a571f9a]: Add pcmwiki [[phab:T310880|T310880]]
* 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1008.eqiad.wmnet
* 09:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1006.eqiad.wmnet
* 08:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1006.eqiad.wmnet
* 08:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
* 08:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
* 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 08:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:829562{{!}}Stop writing to old templatelinks fields in s7 (T312865)]] (duration: 03m 51s)
* 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
* 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:01 XioNoX: rename Telia to Arelion in Netbox
* 07:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:32 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:829556{{!}}Make English Wikipedia read new on templatelinks migration (T306673)]] (duration: 03m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|739920ceb09358a2ea89d82494522876fffd2621}}: Fix missing logo for mniwiktionary and frwikiquote ([[phab:T317004|T317004]]) (duration: 03m 36s)
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|ff2e1082d8b3fe0ba93cd37a1b516dece84a834b}}: Upload missing logo for mniwiktionary and frwikiquote ([[phab:T317004|T317004]]) (duration: 03m 50s)
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:19 moritzm: installing ghostscript security updates
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:823678{{!}}Move 10% of traffic to php 7.4 (T271736)]] (duration: 03m 50s)
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-eqiad:xe-4/1/3
* 06:28 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-eqiad:xe-4/1/3
* 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 06:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 02:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
* 02:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
* 02:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P33778 and previous config saved to /var/cache/conftool/dbconfig/20220905-024602-ladsgroup.json
* 00:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance

Revision as of 01:03, 6 September 2022

2022-09-06

  • 01:03 TimStarling: multi-DC stage 3: 2% of codfw/ulsfo/eqsin traffic going to codfw appservers, rolling out via puppet 00:54-01:24
  • 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance

2022-09-05

  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T314041)', diff saved to https://phabricator.wikimedia.org/P33811 and previous config saved to /var/cache/conftool/dbconfig/20220905-232237-ladsgroup.json
  • 23:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 23:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T314041)', diff saved to https://phabricator.wikimedia.org/P33810 and previous config saved to /var/cache/conftool/dbconfig/20220905-232216-ladsgroup.json
  • 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33809 and previous config saved to /var/cache/conftool/dbconfig/20220905-230709-ladsgroup.json
  • 22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33808 and previous config saved to /var/cache/conftool/dbconfig/20220905-225203-ladsgroup.json
  • 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T314041)', diff saved to https://phabricator.wikimedia.org/P33807 and previous config saved to /var/cache/conftool/dbconfig/20220905-223657-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T314041)', diff saved to https://phabricator.wikimedia.org/P33806 and previous config saved to /var/cache/conftool/dbconfig/20220905-212415-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33805 and previous config saved to /var/cache/conftool/dbconfig/20220905-212343-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33804 and previous config saved to /var/cache/conftool/dbconfig/20220905-210837-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33803 and previous config saved to /var/cache/conftool/dbconfig/20220905-205330-ladsgroup.json
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33802 and previous config saved to /var/cache/conftool/dbconfig/20220905-203824-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33801 and previous config saved to /var/cache/conftool/dbconfig/20220905-192554-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33800 and previous config saved to /var/cache/conftool/dbconfig/20220905-191532-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33799 and previous config saved to /var/cache/conftool/dbconfig/20220905-190027-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33798 and previous config saved to /var/cache/conftool/dbconfig/20220905-184522-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: Maint needs to be redone', diff saved to https://phabricator.wikimedia.org/P33797 and previous config saved to /var/cache/conftool/dbconfig/20220905-183017-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312863)', diff saved to https://phabricator.wikimedia.org/P33796 and previous config saved to /var/cache/conftool/dbconfig/20220905-182510-ladsgroup.json
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33795 and previous config saved to /var/cache/conftool/dbconfig/20220905-181003-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33794 and previous config saved to /var/cache/conftool/dbconfig/20220905-175457-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T314041)', diff saved to https://phabricator.wikimedia.org/P33793 and previous config saved to /var/cache/conftool/dbconfig/20220905-175423-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312863)', diff saved to https://phabricator.wikimedia.org/P33792 and previous config saved to /var/cache/conftool/dbconfig/20220905-173951-ladsgroup.json
  • 16:27 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 16:26 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
  • 15:30 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1038.eqiad.wmnet
  • 15:30 moritzm: installing apache2 security updates
  • 15:28 claime: depooled wtp1040.eqiad.wmnet from parsoid cluster T307219
  • 15:19 claime: pooled parse1007.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1007,parse1007.mgmt
  • 15:16 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1007,parse1007.mgmt
  • 15:09 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1007.eqiad.wmnet
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33791 and previous config saved to /var/cache/conftool/dbconfig/20220905-150837-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T314041)', diff saved to https://phabricator.wikimedia.org/P33790 and previous config saved to /var/cache/conftool/dbconfig/20220905-150758-ladsgroup.json
  • 15:04 moritzm: updating docker.io on gitlab-runners
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33789 and previous config saved to /var/cache/conftool/dbconfig/20220905-145252-ladsgroup.json
  • 14:48 claime: Set wtp103[6-7].eqiad.wmnet inactive pending decommission T317025
  • 14:47 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1037.eqiad.wmnet
  • 14:46 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1036.eqiad.wmnet
  • 14:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1036-1038].eqiad.wmnet with reason: Downtiming replace wtp servers
  • 14:40 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1036-1038].eqiad.wmnet with reason: Downtiming replace wtp servers
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33788 and previous config saved to /var/cache/conftool/dbconfig/20220905-143746-ladsgroup.json
  • 14:33 claime: depooled wtp1039.eqiad.wmnet from parsoid cluster T307219
  • 14:30 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:30 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:29 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:29 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:28 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:28 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:23 claime: pooled parse1006.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T314041)', diff saved to https://phabricator.wikimedia.org/P33786 and previous config saved to /var/cache/conftool/dbconfig/20220905-142240-ladsgroup.json
  • 14:21 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1006,parse1006.mgmt
  • 14:21 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1006,parse1006.mgmt
  • 14:11 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1006.eqiad.wmnet
  • 14:02 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:02 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:01 claime: depooled wtp1038.eqiad.wmnet from parsoid cluster T307219
  • 13:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:48 claime: pooled parse1005.eqiad.wmnet (php 7.4 only) in parsoid cluster T307219
  • 13:41 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:31 addshore: wdqs1009 sudo systemctl stop wdqs-blazegraph.service
  • 13:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1011.eqiad.wmnet with OS bullseye
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on puppetdb2002.codfw.wmnet with reason: Temporarily stop puppetdb
  • 13:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on puppetdb2002.codfw.wmnet with reason: Temporarily stop puppetdb
  • 13:10 urbanecm: UTC afternoon B&C window done
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T314041)', diff saved to https://phabricator.wikimedia.org/P33785 and previous config saved to /var/cache/conftool/dbconfig/20220905-130944-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: edbcee4: Enable partial action blocks on fawiki (T315525) (duration: 03m 34s)
  • 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:07 moritzm: disabling puppet in codfw and the edges temporarily
  • 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 13:01 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 12:48 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
  • 12:47 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 12:33 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host datahubsearch1003.eqiad.wmnet
  • 12:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1005,parse1005.mgmt
  • 12:31 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1005,parse1005.mgmt
  • 12:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
  • 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
  • 12:20 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 18 hosts with reason: Downtime pending inclusion in production
  • 12:20 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 18 hosts with reason: Downtime pending inclusion in production
  • 12:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
  • 12:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 12:16 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1005.eqiad.wmnet
  • 12:14 claime: depooled wtp1037.eqiad.wmnet from parsoid cluster T312638
  • 12:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
  • 12:10 tstarling@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db[2142-2144].codfw.wmnet
  • 12:10 tstarling@cumin1001: START - Cookbook sre.hosts.remove-downtime for db[2142-2144].codfw.wmnet
  • 12:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1004.mgmt
  • 12:10 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1004.mgmt
  • 12:10 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 12:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
  • 11:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[1001-1004].eqiad.wmnet
  • 11:56 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[1001-1004].eqiad.wmnet
  • 11:55 TimStarling: on db2142: rejecting inbound mysql traffic T316847
  • 11:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
  • 11:53 claime: pooled parse1004.eqiad.wmnet (php 7.4 only) in parsoid cluster T312638
  • 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1004.eqiad.wmnet
  • 11:52 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1004.eqiad.wmnet
  • 11:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T312863)', diff saved to https://phabricator.wikimedia.org/P33784 and previous config saved to /var/cache/conftool/dbconfig/20220905-114352-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-eqiad:xe-4/1/3
  • 11:41 jnuche@deploy1002: Installation of scap version "4.16.0" completed for 584 hosts
  • 11:41 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-eqiad:xe-4/1/3
  • 11:40 jnuche@deploy1002: Installing scap version "4.16.0" for 584 hosts
  • 11:37 TimStarling: on db2142: dropping inbound mysql traffic T316847
  • 11:36 claime: Set wtp103[4-5].eqiad.wmnet inactive pending decommission https://phabricator.wikimedia.org/T317025
  • 11:34 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1035.eqiad.wmnet
  • 11:34 cgoubert@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=parsoid,name=wtp1034.eqiad.wmnet
  • 11:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wtp[1034-1036].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 11:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wtp[1034-1036].eqiad.wmnet with reason: Downtiming replaced wtp servers
  • 11:30 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1004.eqiad.wmnet
  • 11:29 TimStarling: on db2142: set master_delay=30 and restarted replication T316847
  • 11:27 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1003.eqiad.wmnet
  • 11:27 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1003.eqiad.wmnet
  • 11:24 claime: depooled wtp1036.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T314041)', diff saved to https://phabricator.wikimedia.org/P33783 and previous config saved to /var/cache/conftool/dbconfig/20220905-112308-ladsgroup.json
  • 11:18 TimStarling: on db2142: stopped mariadb replication
  • 11:16 claime: pooled parse1003.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
  • 11:16 tstarling@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2142-2144].codfw.wmnet with reason: T316847 x2 failure test
  • 11:15 tstarling@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2142-2144].codfw.wmnet with reason: T316847 x2 failure test
  • 11:15 cgoubert@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P33782 and previous config saved to /var/cache/conftool/dbconfig/20220905-110801-ladsgroup.json
  • 11:04 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1003.eqiad.wmnet
  • 10:55 Emperor: set thanos ring replicas to 3.90 T311690
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P33781 and previous config saved to /var/cache/conftool/dbconfig/20220905-105255-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T314041)', diff saved to https://phabricator.wikimedia.org/P33780 and previous config saved to /var/cache/conftool/dbconfig/20220905-103749-ladsgroup.json
  • 10:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1015.eqiad.wmnet
  • 10:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1015.eqiad.wmnet
  • 10:25 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1014.eqiad.wmnet
  • 10:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1014.eqiad.wmnet
  • 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1013.eqiad.wmnet
  • 10:13 XioNoX: upgrade python-pynetbox to 6.6 on netbox frontends - T310745
  • 10:11 hnowlan@deploy1002: Finished deploy [restbase/deploy@79b3cd2]: Add guwwiktionary and bjnwiktionary T309058 T312216 (duration: 15m 05s)
  • 10:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1013.eqiad.wmnet
  • 09:56 hnowlan@deploy1002: Started deploy [restbase/deploy@79b3cd2]: Add guwwiktionary and bjnwiktionary T309058 T312216
  • 09:47 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 09:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1012.eqiad.wmnet
  • 09:37 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 09:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1012.eqiad.wmnet
  • 09:25 btullis: deployed calico to dse-k8s cluster T310174
  • 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T314041)', diff saved to https://phabricator.wikimedia.org/P33779 and previous config saved to /var/cache/conftool/dbconfig/20220905-092338-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 09:23 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 09:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1010.eqiad.wmnet
  • 09:17 XioNoX: Squid: permit production networks instead of aggregate_networks - T265864
  • 09:17 moritzm: installing flac security updates
  • 09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1010.eqiad.wmnet
  • 09:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1008.eqiad.wmnet
  • 09:05 hnowlan@deploy1002: Finished deploy [restbase/deploy@a571f9a]: Add pcmwiki T310880 (duration: 01m 06s)
  • 09:04 hnowlan@deploy1002: Started deploy [restbase/deploy@a571f9a]: Add pcmwiki T310880
  • 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1008.eqiad.wmnet
  • 09:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1006.eqiad.wmnet
  • 08:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1006.eqiad.wmnet
  • 08:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
  • 08:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to old templatelinks fields in s7 (T312865) (duration: 03m 51s)
  • 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:01 XioNoX: rename Telia to Arelion in Netbox
  • 07:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:32 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make English Wikipedia read new on templatelinks migration (T306673) (duration: 03m 31s)
  • 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 739920c: Fix missing logo for mniwiktionary and frwikiquote (T317004) (duration: 03m 36s)
  • 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:22 urbanecm@deploy1002: Synchronized static/images/project-logos/: ff2e108: Upload missing logo for mniwiktionary and frwikiquote (T317004) (duration: 03m 50s)
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:19 moritzm: installing ghostscript security updates
  • 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Move 10% of traffic to php 7.4 (T271736) (duration: 03m 50s)
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-eqiad:xe-4/1/3
  • 06:28 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-eqiad:xe-4/1/3
  • 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 02:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 02:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 02:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33778 and previous config saved to /var/cache/conftool/dbconfig/20220905-024602-ladsgroup.json
  • 00:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 00:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33777 and previous config saved to /var/cache/conftool/dbconfig/20220905-003619-ladsgroup.json
  • 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33776 and previous config saved to /var/cache/conftool/dbconfig/20220905-002112-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33775 and previous config saved to /var/cache/conftool/dbconfig/20220905-000606-ladsgroup.json

2022-09-04

  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33774 and previous config saved to /var/cache/conftool/dbconfig/20220904-235100-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33773 and previous config saved to /var/cache/conftool/dbconfig/20220904-225044-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33772 and previous config saved to /var/cache/conftool/dbconfig/20220904-225016-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33771 and previous config saved to /var/cache/conftool/dbconfig/20220904-223510-ladsgroup.json
  • 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33770 and previous config saved to /var/cache/conftool/dbconfig/20220904-222004-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33769 and previous config saved to /var/cache/conftool/dbconfig/20220904-220457-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33767 and previous config saved to /var/cache/conftool/dbconfig/20220904-155059-ladsgroup.json
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33766 and previous config saved to /var/cache/conftool/dbconfig/20220904-155027-ladsgroup.json
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33765 and previous config saved to /var/cache/conftool/dbconfig/20220904-153521-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33764 and previous config saved to /var/cache/conftool/dbconfig/20220904-152015-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33763 and previous config saved to /var/cache/conftool/dbconfig/20220904-150508-ladsgroup.json
  • 12:51 elukey: reset-fail ifup@ens13.service on idp2002
  • 12:50 elukey: reset-fail ifup@ens13.service on netflow4002
  • 12:49 elukey: pkill remaining processes of user effeietsanders on stat1008 to unblock puppet - T314846
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33762 and previous config saved to /var/cache/conftool/dbconfig/20220904-103427-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T314041)', diff saved to https://phabricator.wikimedia.org/P33761 and previous config saved to /var/cache/conftool/dbconfig/20220904-103405-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T312863)', diff saved to https://phabricator.wikimedia.org/P33760 and previous config saved to /var/cache/conftool/dbconfig/20220904-083341-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance

2022-09-03

  • 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33759 and previous config saved to /var/cache/conftool/dbconfig/20220903-235001-ladsgroup.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33758 and previous config saved to /var/cache/conftool/dbconfig/20220903-233455-ladsgroup.json
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33757 and previous config saved to /var/cache/conftool/dbconfig/20220903-231949-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33756 and previous config saved to /var/cache/conftool/dbconfig/20220903-230443-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T312863)', diff saved to https://phabricator.wikimedia.org/P33755 and previous config saved to /var/cache/conftool/dbconfig/20220903-220427-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T312863)', diff saved to https://phabricator.wikimedia.org/P33754 and previous config saved to /var/cache/conftool/dbconfig/20220903-220326-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P33753 and previous config saved to /var/cache/conftool/dbconfig/20220903-214820-ladsgroup.json
  • 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P33752 and previous config saved to /var/cache/conftool/dbconfig/20220903-213314-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T312863)', diff saved to https://phabricator.wikimedia.org/P33751 and previous config saved to /var/cache/conftool/dbconfig/20220903-211808-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T314041)', diff saved to https://phabricator.wikimedia.org/P33750 and previous config saved to /var/cache/conftool/dbconfig/20220903-180104-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T314041)', diff saved to https://phabricator.wikimedia.org/P33749 and previous config saved to /var/cache/conftool/dbconfig/20220903-180042-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T312863)', diff saved to https://phabricator.wikimedia.org/P33748 and previous config saved to /var/cache/conftool/dbconfig/20220903-151224-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T314041)', diff saved to https://phabricator.wikimedia.org/P33747 and previous config saved to /var/cache/conftool/dbconfig/20220903-015524-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T314041)', diff saved to https://phabricator.wikimedia.org/P33746 and previous config saved to /var/cache/conftool/dbconfig/20220903-015502-ladsgroup.json

2022-09-02

  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:58 dancy@deploy1002: Sync cancelled.
  • 18:56 dancy@deploy1002: dancy: testing T299648 synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 18:55 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:51 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:47 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:40 dancy@deploy1002: Started scap: testing T299648
  • 17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1203.eqiad.wmnet with OS bullseye
  • 17:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1203.eqiad.wmnet with reason: host reimage
  • 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1203.eqiad.wmnet with reason: host reimage
  • 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1203.eqiad.wmnet with OS bullseye
  • 17:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS bullseye
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1201.eqiad.wmnet with OS bullseye
  • 16:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
  • 16:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
  • 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1201.eqiad.wmnet with reason: host reimage
  • 16:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1201.eqiad.wmnet with reason: host reimage
  • 16:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS bullseye
  • 16:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1200.eqiad.wmnet with OS bullseye
  • 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1203']
  • 16:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1201.eqiad.wmnet with OS bullseye
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1203']
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1199.eqiad.wmnet with OS bullseye
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1202']
  • 16:23 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1200.eqiad.wmnet with reason: host reimage
  • 16:19 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1200.eqiad.wmnet with reason: host reimage
  • 16:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1203']
  • 16:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1202']
  • 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
  • 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1203']
  • 16:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1202']
  • 16:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
  • 16:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1200.eqiad.wmnet with OS bullseye
  • 16:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1201']
  • 16:03 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1198.eqiad.wmnet with OS bullseye
  • 16:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1202']
  • 15:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:57 jayme: repool kubemaster2002
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1201']
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1199.eqiad.wmnet with OS bullseye
  • 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1200']
  • 15:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
  • 15:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1197.eqiad.wmnet with OS bullseye
  • 15:45 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
  • 15:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1200']
  • 15:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1200']
  • 15:39 jayme: depool kubemaster2002
  • 15:37 jayme: repooled kubemaster1001
  • 15:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1201']
  • 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1201']
  • 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1197.eqiad.wmnet with reason: host reimage
  • 15:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1198.eqiad.wmnet with OS bullseye
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1201']
  • 15:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1200']
  • 15:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1196.eqiad.wmnet with OS bullseye
  • 15:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1197.eqiad.wmnet with reason: host reimage
  • 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1198']
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1201']
  • 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1199']
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
  • 15:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db1197.eqiad.wmnet with OS bullseye
  • 15:16 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: host reimage
  • 15:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1199']
  • 15:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1199']
  • 15:13 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1196.eqiad.wmnet with reason: host reimage
  • 15:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1198']
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1199']
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
  • 15:04 jayme: depooled kubemaster1001
  • 15:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1198']
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1198']
  • 15:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1197']
  • 15:00 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1196.eqiad.wmnet with OS bullseye
  • 14:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003.wikimedia.org
  • 14:58 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1198']
  • 14:55 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:53 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1197']
  • 14:51 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003.wikimedia.org
  • 14:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003.wikimedia.org
  • 14:49 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1197']
  • 14:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1196']
  • 14:46 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:42 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003.wikimedia.org
  • 14:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1197']
  • 14:39 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 14:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1196']
  • 14:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 14:21 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db1196']
  • 14:18 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003
  • 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:16 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db1197']
  • 14:15 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003
  • 14:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1197']
  • 14:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 13:57 pt1979@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1196']
  • 13:57 pt1979@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 13:31 pt1979@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db1196']
  • 13:31 pt1979@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1196']
  • 13:15 jayme: repooled kubemaster1002
  • 13:10 jayme: redepooled kubemaster1002
  • 13:00 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:56 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MMandere out of all services on: 1235 hosts
  • 10:35 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MMandere out of all services on: 1235 hosts
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MMandere out of all services on: 779 hosts
  • 10:34 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MMandere out of all services on: 779 hosts
  • 10:08 jayme: depooled kubemaster1002 for tests
  • 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T314041)', diff saved to https://phabricator.wikimedia.org/P33743 and previous config saved to /var/cache/conftool/dbconfig/20220902-092704-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 08:41 fnegri@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:37 fnegri@cumin1001: START - Cookbook sre.dns.netbox
  • 08:36 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 08:26 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
  • 08:13 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp2002.wikimedia.org
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
  • 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
  • 07:17 dcausse: restarting blazegraph on wdqs1016 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 to clone db1107 T316870', diff saved to https://phabricator.wikimedia.org/P33739 and previous config saved to /var/cache/conftool/dbconfig/20220902-054405-root.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2149 T316494 ', diff saved to https://phabricator.wikimedia.org/P33738 and previous config saved to /var/cache/conftool/dbconfig/20220902-052841-marostegui.json

2022-09-01

  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:50 thcipriani@deploy1002: Finished scap: Backport for Remove Vector grid config (T313559), Disable sticky header edit experiment for idwiki, viwki (T315264) (duration: 05m 44s)
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:44 thcipriani@deploy1002: thcipriani and cjming and bwang: Backport for Remove Vector grid config (T313559), Disable sticky header edit experiment for idwiki, viwki (T315264) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:44 thcipriani@deploy1002: Started scap: Backport for Remove Vector grid config (T313559), Disable sticky header edit experiment for idwiki, viwki (T315264)
  • 20:41 thcipriani@deploy1002: Finished scap: Backport for cirrus: Handle transition to elasticsearch 7.10 (duration: 16m 56s)
  • 20:40 ryankemper: T300943 New hosts are in service and were pooled like so: `sudo confctl select name=elastic20[73-86].* set/weight=10:pooled=yes` (in retrospect that syntax seems to have selected too many hosts, but the final state of pybal is correct per https://config-master.wikimedia.org/pybal/codfw/search)
  • 20:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1203.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1202.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:37 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic20[73-86].*
  • 20:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: T300943
  • 20:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: T300943
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for cirrus: Handle transition to elasticsearch 7.10 synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:24 thcipriani@deploy1002: Started scap: Backport for cirrus: Handle transition to elasticsearch 7.10
  • 20:20 thcipriani@deploy1002: backport aborted: (duration: 03m 09s)
  • 20:20 thcipriani@deploy1002: backport aborted: (duration: 02m 57s)
  • 20:20 thcipriani@deploy1002: sync-world aborted: Backport for Revert "Deploy Research Incentive Survey to idwiki" (duration: 01m 23s)
  • 20:20 thcipriani@deploy1002: thcipriani and trainbranchbot: Backport for Revert "Deploy Research Incentive Survey to idwiki" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:19 thcipriani@deploy1002: Started scap: Backport for Revert "Deploy Research Incentive Survey to idwiki"
  • 20:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1203.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1202.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1201.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1200.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 thcipriani@deploy1002: thcipriani and dani: Backport for Deploy Research Incentive Survey to idwiki (T316466) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:06 thcipriani@deploy1002: Started scap: Backport for Deploy Research Incentive Survey to idwiki (T316466)
  • 19:58 mutante: otrs1001 - sudo systemctl reset-failed - T316903
  • 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1201.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:46 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1200.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1199.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1198.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1199.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1198.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1197.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:16 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1196.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db1197.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:53 pt1979@cumin1001: START - Cookbook sre.hosts.provision for host db1196.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:52 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1203
  • 18:51 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1203
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1202
  • 18:51 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1202
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1201
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1201
  • 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1200
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1200
  • 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1199
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1199
  • 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1198
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1198
  • 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1197
  • 18:49 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1197
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1196
  • 18:49 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db1196
  • 18:48 pt1979@cumin1001: START - Cookbook sre.dns.netbox
  • 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:42 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.27 refs T314188
  • 17:34 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:33 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:33 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:32 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:32 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:31 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:26 herron: restarted rsyslog on centrallog2002
  • 16:29 topranks: Brining Lumen Tranport CCT 442550294 (cr1-codfw to cr4-ulsfo) back into service following successful hot-cut to lower-latency path with carrier
  • 16:17 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase103[1-3].eqiad.wmnet
  • 15:55 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase103[1-3].eqiad.wmnet
  • 15:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:21 moritzm: installing usb.ids update from Bullseye 11.4 point release
  • 15:19 moritzm: updating docker.io on ml-serve* to bugfix release from Bullseye 11.4 point release
  • 14:54 topranks: Draining traffic from Lumen Tranport CCT 442550294 (cr1-codfw to cr4-ulsfo) ahead of hot-cut to lower-latency path with carrier
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
  • 14:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
  • 14:07 moritzm: installing net-snmp security updates on Buster
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
  • 14:01 marostegui: test T316744
  • 14:01 marostegui: test T316744
  • 14:00 marostegui: Failover m5 from db1107 to db1183 - T316744
  • 13:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
  • 13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
  • 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netbox1002.eqiad.wmnet
  • 13:43 moritzm: rebooting netbox1002 (running netbox.wikimedia.org)
  • 13:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox1002.eqiad.wmnet
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2002.codfw.wmnet
  • 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox2002.codfw.wmnet
  • 13:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1107,1117,1183].eqiad.wmnet with reason: switchover m5 T316744
  • 13:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1107,1117,1183].eqiad.wmnet with reason: switchover m5 T316744
  • 13:19 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:19 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Move 5% of traffic to php 7.4 (T271736) (duration: 03m 45s)
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:00 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:59 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:56 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:29 herron: restarted thanos-query on thanos-fe1001
  • 12:20 cdanis@cumin2002: dbctl commit (dc=all): 'T316482 remove replicas from x2', diff saved to https://phabricator.wikimedia.org/P33736 and previous config saved to /var/cache/conftool/dbconfig/20220901-122026-cdanis.json
  • 12:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve-ctrl1001.eqiad.wmnet
  • 12:13 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-serve-ctrl1001.eqiad.wmnet
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33735 and previous config saved to /var/cache/conftool/dbconfig/20220901-121252-ladsgroup.json
  • 12:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 12:05 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 12:03 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 11:59 moritzm: rebalance row B after completed Bullseye updates T311686
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33734 and previous config saved to /var/cache/conftool/dbconfig/20220901-115746-ladsgroup.json
  • 11:48 cdanis: root@apt1001:/home/cdanis/build-area# reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia conftool_2.2.2-1_amd64.changes
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33733 and previous config saved to /var/cache/conftool/dbconfig/20220901-114239-ladsgroup.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33732 and previous config saved to /var/cache/conftool/dbconfig/20220901-112733-ladsgroup.json
  • 11:04 claime: depooled wtp1035.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2002.codfw.wmnet
  • 10:58 claime: pooled parse1002.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
  • 10:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
  • 10:56 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
  • 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki2002.codfw.wmnet
  • 10:43 claime: depooled wtp1034.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
  • 10:43 claime: pooled parse1001.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
  • 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 10:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1001.eqiad.wmnet
  • 10:40 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1001.eqiad.wmnet
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
  • 10:36 cgoubert@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1002.eqiad.wmnet
  • 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 10:29 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 10:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:13 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1013 backt to pc3 master (duration: 03m 43s)
  • 10:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:58 cgoubert@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update wgLinterSubmitterWhitelist (T312638) (duration: 03m 37s)
  • 09:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:32 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc3 master (duration: 03m 34s)
  • 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2015.codfw.wmnet to cluster codfw and group D
  • 08:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Readding downtime removed by reimage
  • 08:17 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: Readding downtime removed by reimage
  • 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2015.codfw.wmnet to cluster codfw and group D
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
  • 07:56 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 1% of traffic to php 7.4 (duration: 03m 42s)
  • 07:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
  • 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2015.codfw.wmnet with OS bullseye
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2015.codfw.wmnet with reason: host reimage
  • 07:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2015.codfw.wmnet with reason: host reimage
  • 06:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2015.codfw.wmnet with OS bullseye
  • 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:25 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Reverting to no php 7.4 traffic (duration: 03m 44s)
  • 06:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:10 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 1% of users to php 7.4 (duration: 03m 55s)
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1136 T316111', diff saved to https://phabricator.wikimedia.org/P33729 and previous config saved to /var/cache/conftool/dbconfig/20220901-060923-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write T316111', diff saved to https://phabricator.wikimedia.org/P33728 and previous config saved to /var/cache/conftool/dbconfig/20220901-060128-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T316111', diff saved to https://phabricator.wikimedia.org/P33727 and previous config saved to /var/cache/conftool/dbconfig/20220901-060100-ladsgroup.json
  • 06:00 Amir1: Starting s7 eqiad failover from db1136 to db1181 - T316111
  • 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1181 with weight 0 T316111', diff saved to https://phabricator.wikimedia.org/P33726 and previous config saved to /var/cache/conftool/dbconfig/20220901-051701-ladsgroup.json
  • 05:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T316111
  • 05:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T316111
  • 01:20 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase201[3-8].codfw.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
  • 00:21 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase201[3-8].codfw.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001

Archives

See Server Admin Log/Archives.