You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23315 and previous config saved to /var/cache/conftool/dbconfig/20220328-005533-ladsgroup.json)
imported>Stashbot
(eileen: civicrm upgraded from dcef393d to e198fb4c)
 
(178 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2022-03-28 ==
== 2022-09-27 ==
* 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23315 and previous config saved to /var/cache/conftool/dbconfig/20220328-005533-ladsgroup.json
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23314 and previous config saved to /var/cache/conftool/dbconfig/20220328-004027-ladsgroup.json
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23313 and previous config saved to /var/cache/conftool/dbconfig/20220328-001707-ladsgroup.json
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 00:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2022-03-27 ==
== 2022-09-26 ==
* 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23312 and previous config saved to /var/cache/conftool/dbconfig/20220327-235516-ladsgroup.json
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23311 and previous config saved to /var/cache/conftool/dbconfig/20220327-234011-ladsgroup.json
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23310 and previous config saved to /var/cache/conftool/dbconfig/20220327-232506-ladsgroup.json
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23309 and previous config saved to /var/cache/conftool/dbconfig/20220327-231001-ladsgroup.json
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23308 and previous config saved to /var/cache/conftool/dbconfig/20220327-224707-ladsgroup.json
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 22:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 22:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23307 and previous config saved to /var/cache/conftool/dbconfig/20220327-224659-ladsgroup.json
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23306 and previous config saved to /var/cache/conftool/dbconfig/20220327-223154-ladsgroup.json
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23305 and previous config saved to /var/cache/conftool/dbconfig/20220327-221649-ladsgroup.json
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23304 and previous config saved to /var/cache/conftool/dbconfig/20220327-220143-ladsgroup.json
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23303 and previous config saved to /var/cache/conftool/dbconfig/20220327-215440-ladsgroup.json
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23302 and previous config saved to /var/cache/conftool/dbconfig/20220327-215432-ladsgroup.json
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23301 and previous config saved to /var/cache/conftool/dbconfig/20220327-213927-ladsgroup.json
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23300 and previous config saved to /var/cache/conftool/dbconfig/20220327-212422-ladsgroup.json
* 20:31 TheresNoTime: closing UTC late backport window
* 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23299 and previous config saved to /var/cache/conftool/dbconfig/20220327-210917-ladsgroup.json
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23298 and previous config saved to /var/cache/conftool/dbconfig/20220327-204604-ladsgroup.json
* 20:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 20:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 20:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 20:20 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: sync
* 20:20 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: sync
* 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23297 and previous config saved to /var/cache/conftool/dbconfig/20220327-195258-ladsgroup.json
* 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23296 and previous config saved to /var/cache/conftool/dbconfig/20220327-193753-ladsgroup.json
* 19:35 _joe_: $ sudo cumin -b1 -s20 'A:mw-api and P<nowiki>{</nowiki>mw13[56-82].eqiad.wmnet<nowiki>}</nowiki>' 'restart-php7.2-fpm'
* 19:25 _joe_: restarting php on mw1380
* 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23295 and previous config saved to /var/cache/conftool/dbconfig/20220327-192247-ladsgroup.json
* 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23294 and previous config saved to /var/cache/conftool/dbconfig/20220327-190742-ladsgroup.json
* 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23293 and previous config saved to /var/cache/conftool/dbconfig/20220327-184107-ladsgroup.json
* 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23292 and previous config saved to /var/cache/conftool/dbconfig/20220327-184059-ladsgroup.json
* 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23291 and previous config saved to /var/cache/conftool/dbconfig/20220327-182554-ladsgroup.json
* 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23290 and previous config saved to /var/cache/conftool/dbconfig/20220327-181049-ladsgroup.json
* 17:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23289 and previous config saved to /var/cache/conftool/dbconfig/20220327-175544-ladsgroup.json
* 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23288 and previous config saved to /var/cache/conftool/dbconfig/20220327-165530-ladsgroup.json
* 16:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 16:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23287 and previous config saved to /var/cache/conftool/dbconfig/20220327-165522-ladsgroup.json
* 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23286 and previous config saved to /var/cache/conftool/dbconfig/20220327-164017-ladsgroup.json
* 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23285 and previous config saved to /var/cache/conftool/dbconfig/20220327-162511-ladsgroup.json
* 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23284 and previous config saved to /var/cache/conftool/dbconfig/20220327-161006-ladsgroup.json
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23283 and previous config saved to /var/cache/conftool/dbconfig/20220327-154357-ladsgroup.json
* 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 15:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 15:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 14:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23282 and previous config saved to /var/cache/conftool/dbconfig/20220327-145341-ladsgroup.json
* 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23281 and previous config saved to /var/cache/conftool/dbconfig/20220327-143835-ladsgroup.json
* 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23280 and previous config saved to /var/cache/conftool/dbconfig/20220327-142330-ladsgroup.json
* 14:20 elukey: roll restart of wqds-blazegraph-public codfw
* 14:18 elukey: restart blazegraph on wdqs2003
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23279 and previous config saved to /var/cache/conftool/dbconfig/20220327-140825-ladsgroup.json
* 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23278 and previous config saved to /var/cache/conftool/dbconfig/20220327-134411-ladsgroup.json
* 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23277 and previous config saved to /var/cache/conftool/dbconfig/20220327-134358-ladsgroup.json
* 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23276 and previous config saved to /var/cache/conftool/dbconfig/20220327-132852-ladsgroup.json
* 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23275 and previous config saved to /var/cache/conftool/dbconfig/20220327-131347-ladsgroup.json
* 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23274 and previous config saved to /var/cache/conftool/dbconfig/20220327-125842-ladsgroup.json
* 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23273 and previous config saved to /var/cache/conftool/dbconfig/20220327-125128-ladsgroup.json
* 12:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23272 and previous config saved to /var/cache/conftool/dbconfig/20220327-125120-ladsgroup.json
* 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23271 and previous config saved to /var/cache/conftool/dbconfig/20220327-123615-ladsgroup.json
* 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23270 and previous config saved to /var/cache/conftool/dbconfig/20220327-122110-ladsgroup.json
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23269 and previous config saved to /var/cache/conftool/dbconfig/20220327-120604-ladsgroup.json
* 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23268 and previous config saved to /var/cache/conftool/dbconfig/20220327-114152-ladsgroup.json
* 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23267 and previous config saved to /var/cache/conftool/dbconfig/20220327-112003-ladsgroup.json
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23266 and previous config saved to /var/cache/conftool/dbconfig/20220327-110457-ladsgroup.json
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23265 and previous config saved to /var/cache/conftool/dbconfig/20220327-104952-ladsgroup.json
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23264 and previous config saved to /var/cache/conftool/dbconfig/20220327-103447-ladsgroup.json
* 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23263 and previous config saved to /var/cache/conftool/dbconfig/20220327-101022-ladsgroup.json
* 10:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 10:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23262 and previous config saved to /var/cache/conftool/dbconfig/20220327-101014-ladsgroup.json
* 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23261 and previous config saved to /var/cache/conftool/dbconfig/20220327-095509-ladsgroup.json
* 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23260 and previous config saved to /var/cache/conftool/dbconfig/20220327-094004-ladsgroup.json
* 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23259 and previous config saved to /var/cache/conftool/dbconfig/20220327-092459-ladsgroup.json
* 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23258 and previous config saved to /var/cache/conftool/dbconfig/20220327-085741-ladsgroup.json
* 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23257 and previous config saved to /var/cache/conftool/dbconfig/20220327-085733-ladsgroup.json
* 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23256 and previous config saved to /var/cache/conftool/dbconfig/20220327-084228-ladsgroup.json
* 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23255 and previous config saved to /var/cache/conftool/dbconfig/20220327-082723-ladsgroup.json
* 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23254 and previous config saved to /var/cache/conftool/dbconfig/20220327-081218-ladsgroup.json
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23253 and previous config saved to /var/cache/conftool/dbconfig/20220327-071203-ladsgroup.json
* 07:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 07:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23252 and previous config saved to /var/cache/conftool/dbconfig/20220327-071156-ladsgroup.json
* 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23251 and previous config saved to /var/cache/conftool/dbconfig/20220327-065651-ladsgroup.json
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23250 and previous config saved to /var/cache/conftool/dbconfig/20220327-064146-ladsgroup.json
* 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23249 and previous config saved to /var/cache/conftool/dbconfig/20220327-062641-ladsgroup.json
* 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23248 and previous config saved to /var/cache/conftool/dbconfig/20220327-055108-ladsgroup.json
* 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23247 and previous config saved to /var/cache/conftool/dbconfig/20220327-055100-ladsgroup.json
* 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23246 and previous config saved to /var/cache/conftool/dbconfig/20220327-053555-ladsgroup.json
* 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23245 and previous config saved to /var/cache/conftool/dbconfig/20220327-052050-ladsgroup.json
* 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23244 and previous config saved to /var/cache/conftool/dbconfig/20220327-050545-ladsgroup.json
* 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23243 and previous config saved to /var/cache/conftool/dbconfig/20220327-044235-ladsgroup.json
* 04:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 04:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 04:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 04:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23242 and previous config saved to /var/cache/conftool/dbconfig/20220327-042041-ladsgroup.json
* 04:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23241 and previous config saved to /var/cache/conftool/dbconfig/20220327-040536-ladsgroup.json
* 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23240 and previous config saved to /var/cache/conftool/dbconfig/20220327-035031-ladsgroup.json
* 03:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23239 and previous config saved to /var/cache/conftool/dbconfig/20220327-033526-ladsgroup.json
* 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23238 and previous config saved to /var/cache/conftool/dbconfig/20220327-031115-ladsgroup.json
* 03:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 03:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23237 and previous config saved to /var/cache/conftool/dbconfig/20220327-031108-ladsgroup.json
* 02:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23236 and previous config saved to /var/cache/conftool/dbconfig/20220327-025603-ladsgroup.json
* 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23235 and previous config saved to /var/cache/conftool/dbconfig/20220327-024057-ladsgroup.json
* 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23234 and previous config saved to /var/cache/conftool/dbconfig/20220327-022552-ladsgroup.json
* 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23233 and previous config saved to /var/cache/conftool/dbconfig/20220327-015848-ladsgroup.json
* 01:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 01:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23232 and previous config saved to /var/cache/conftool/dbconfig/20220327-015840-ladsgroup.json
* 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23231 and previous config saved to /var/cache/conftool/dbconfig/20220327-014335-ladsgroup.json
* 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23230 and previous config saved to /var/cache/conftool/dbconfig/20220327-012829-ladsgroup.json
* 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23229 and previous config saved to /var/cache/conftool/dbconfig/20220327-011324-ladsgroup.json
* 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23228 and previous config saved to /var/cache/conftool/dbconfig/20220327-005010-ladsgroup.json
* 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 00:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 00:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 00:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 00:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 00:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 00:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 00:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23227 and previous config saved to /var/cache/conftool/dbconfig/20220327-000023-ladsgroup.json
 
== 2022-03-26 ==
* 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23226 and previous config saved to /var/cache/conftool/dbconfig/20220326-234517-ladsgroup.json
* 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23225 and previous config saved to /var/cache/conftool/dbconfig/20220326-233012-ladsgroup.json
* 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23224 and previous config saved to /var/cache/conftool/dbconfig/20220326-231507-ladsgroup.json
* 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23223 and previous config saved to /var/cache/conftool/dbconfig/20220326-224955-ladsgroup.json
* 22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 22:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23222 and previous config saved to /var/cache/conftool/dbconfig/20220326-224947-ladsgroup.json
* 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23221 and previous config saved to /var/cache/conftool/dbconfig/20220326-223442-ladsgroup.json
* 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23220 and previous config saved to /var/cache/conftool/dbconfig/20220326-221937-ladsgroup.json
* 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23219 and previous config saved to /var/cache/conftool/dbconfig/20220326-220432-ladsgroup.json
* 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23218 and previous config saved to /var/cache/conftool/dbconfig/20220326-210417-ladsgroup.json
* 21:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 21:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23217 and previous config saved to /var/cache/conftool/dbconfig/20220326-210409-ladsgroup.json
* 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23216 and previous config saved to /var/cache/conftool/dbconfig/20220326-204904-ladsgroup.json
* 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23214 and previous config saved to /var/cache/conftool/dbconfig/20220326-203359-ladsgroup.json
* 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23213 and previous config saved to /var/cache/conftool/dbconfig/20220326-201854-ladsgroup.json
* 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23212 and previous config saved to /var/cache/conftool/dbconfig/20220326-195245-ladsgroup.json
* 19:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 19:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 19:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 19:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 19:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 19:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 19:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 19:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23211 and previous config saved to /var/cache/conftool/dbconfig/20220326-190244-ladsgroup.json
* 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23210 and previous config saved to /var/cache/conftool/dbconfig/20220326-184739-ladsgroup.json
* 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23209 and previous config saved to /var/cache/conftool/dbconfig/20220326-183234-ladsgroup.json
* 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23208 and previous config saved to /var/cache/conftool/dbconfig/20220326-181729-ladsgroup.json
* 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23207 and previous config saved to /var/cache/conftool/dbconfig/20220326-175315-ladsgroup.json
* 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23206 and previous config saved to /var/cache/conftool/dbconfig/20220326-175302-ladsgroup.json
* 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23205 and previous config saved to /var/cache/conftool/dbconfig/20220326-173757-ladsgroup.json
* 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23204 and previous config saved to /var/cache/conftool/dbconfig/20220326-172250-ladsgroup.json
* 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23203 and previous config saved to /var/cache/conftool/dbconfig/20220326-170745-ladsgroup.json
* 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23202 and previous config saved to /var/cache/conftool/dbconfig/20220326-170047-ladsgroup.json
* 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 17:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23201 and previous config saved to /var/cache/conftool/dbconfig/20220326-170039-ladsgroup.json
* 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23200 and previous config saved to /var/cache/conftool/dbconfig/20220326-164534-ladsgroup.json
* 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23199 and previous config saved to /var/cache/conftool/dbconfig/20220326-163029-ladsgroup.json
* 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23198 and previous config saved to /var/cache/conftool/dbconfig/20220326-161523-ladsgroup.json
* 16:00 Amir1: start of mwscript maintenance/migrateLinksTable.php --wiki enwiki --table templatelinks --sleep 2 on beta cluster ([[phab:T299424|T299424]])
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23197 and previous config saved to /var/cache/conftool/dbconfig/20220326-155025-ladsgroup.json
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23196 and previous config saved to /var/cache/conftool/dbconfig/20220326-152835-ladsgroup.json
* 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23195 and previous config saved to /var/cache/conftool/dbconfig/20220326-151330-ladsgroup.json
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23194 and previous config saved to /var/cache/conftool/dbconfig/20220326-145825-ladsgroup.json
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23193 and previous config saved to /var/cache/conftool/dbconfig/20220326-144320-ladsgroup.json
* 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23192 and previous config saved to /var/cache/conftool/dbconfig/20220326-141912-ladsgroup.json
* 14:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 14:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23191 and previous config saved to /var/cache/conftool/dbconfig/20220326-141904-ladsgroup.json
* 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23190 and previous config saved to /var/cache/conftool/dbconfig/20220326-140359-ladsgroup.json
* 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23189 and previous config saved to /var/cache/conftool/dbconfig/20220326-134854-ladsgroup.json
* 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23188 and previous config saved to /var/cache/conftool/dbconfig/20220326-133349-ladsgroup.json
* 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23187 and previous config saved to /var/cache/conftool/dbconfig/20220326-130701-ladsgroup.json
* 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23186 and previous config saved to /var/cache/conftool/dbconfig/20220326-130653-ladsgroup.json
* 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23185 and previous config saved to /var/cache/conftool/dbconfig/20220326-125148-ladsgroup.json
* 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23184 and previous config saved to /var/cache/conftool/dbconfig/20220326-123643-ladsgroup.json
* 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23183 and previous config saved to /var/cache/conftool/dbconfig/20220326-122136-ladsgroup.json
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23182 and previous config saved to /var/cache/conftool/dbconfig/20220326-112122-ladsgroup.json
* 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23181 and previous config saved to /var/cache/conftool/dbconfig/20220326-112114-ladsgroup.json
* 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23180 and previous config saved to /var/cache/conftool/dbconfig/20220326-110609-ladsgroup.json
* 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23179 and previous config saved to /var/cache/conftool/dbconfig/20220326-105104-ladsgroup.json
* 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23178 and previous config saved to /var/cache/conftool/dbconfig/20220326-103559-ladsgroup.json
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23177 and previous config saved to /var/cache/conftool/dbconfig/20220326-100918-ladsgroup.json
* 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23176 and previous config saved to /var/cache/conftool/dbconfig/20220326-100911-ladsgroup.json
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23175 and previous config saved to /var/cache/conftool/dbconfig/20220326-095405-ladsgroup.json
* 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23174 and previous config saved to /var/cache/conftool/dbconfig/20220326-093900-ladsgroup.json
* 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23173 and previous config saved to /var/cache/conftool/dbconfig/20220326-092355-ladsgroup.json
* 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23172 and previous config saved to /var/cache/conftool/dbconfig/20220326-085938-ladsgroup.json
* 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 08:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23171 and previous config saved to /var/cache/conftool/dbconfig/20220326-083731-ladsgroup.json
* 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23170 and previous config saved to /var/cache/conftool/dbconfig/20220326-082225-ladsgroup.json
* 08:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23169 and previous config saved to /var/cache/conftool/dbconfig/20220326-080720-ladsgroup.json
* 07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23168 and previous config saved to /var/cache/conftool/dbconfig/20220326-075215-ladsgroup.json
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23167 and previous config saved to /var/cache/conftool/dbconfig/20220326-072702-ladsgroup.json
* 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23166 and previous config saved to /var/cache/conftool/dbconfig/20220326-072654-ladsgroup.json
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23165 and previous config saved to /var/cache/conftool/dbconfig/20220326-071149-ladsgroup.json
* 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23164 and previous config saved to /var/cache/conftool/dbconfig/20220326-065644-ladsgroup.json
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23163 and previous config saved to /var/cache/conftool/dbconfig/20220326-064139-ladsgroup.json
* 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23162 and previous config saved to /var/cache/conftool/dbconfig/20220326-062131-ladsgroup.json
* 06:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23161 and previous config saved to /var/cache/conftool/dbconfig/20220326-062123-ladsgroup.json
* 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23160 and previous config saved to /var/cache/conftool/dbconfig/20220326-060618-ladsgroup.json
* 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23159 and previous config saved to /var/cache/conftool/dbconfig/20220326-055113-ladsgroup.json
* 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23158 and previous config saved to /var/cache/conftool/dbconfig/20220326-053607-ladsgroup.json
* 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23157 and previous config saved to /var/cache/conftool/dbconfig/20220326-051140-ladsgroup.json
* 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 04:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 04:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 04:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 04:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 04:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 04:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 04:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 04:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 04:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23156 and previous config saved to /var/cache/conftool/dbconfig/20220326-042136-ladsgroup.json
* 04:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23155 and previous config saved to /var/cache/conftool/dbconfig/20220326-040631-ladsgroup.json
* 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23154 and previous config saved to /var/cache/conftool/dbconfig/20220326-035126-ladsgroup.json
* 03:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23153 and previous config saved to /var/cache/conftool/dbconfig/20220326-033621-ladsgroup.json
* 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23152 and previous config saved to /var/cache/conftool/dbconfig/20220326-025754-ladsgroup.json
* 02:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 02:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23151 and previous config saved to /var/cache/conftool/dbconfig/20220326-025746-ladsgroup.json
* 02:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23150 and previous config saved to /var/cache/conftool/dbconfig/20220326-024241-ladsgroup.json
* 02:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23149 and previous config saved to /var/cache/conftool/dbconfig/20220326-022736-ladsgroup.json
* 02:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23148 and previous config saved to /var/cache/conftool/dbconfig/20220326-021231-ladsgroup.json
* 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23147 and previous config saved to /var/cache/conftool/dbconfig/20220326-011216-ladsgroup.json
* 01:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 01:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23146 and previous config saved to /var/cache/conftool/dbconfig/20220326-011209-ladsgroup.json
* 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23145 and previous config saved to /var/cache/conftool/dbconfig/20220326-005704-ladsgroup.json
* 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23144 and previous config saved to /var/cache/conftool/dbconfig/20220326-004159-ladsgroup.json
* 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23143 and previous config saved to /var/cache/conftool/dbconfig/20220326-002653-ladsgroup.json
 
== 2022-03-25 ==
* 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23142 and previous config saved to /var/cache/conftool/dbconfig/20220325-235855-ladsgroup.json
* 23:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 23:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 23:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 23:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23141 and previous config saved to /var/cache/conftool/dbconfig/20220325-230540-ladsgroup.json
* 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23140 and previous config saved to /var/cache/conftool/dbconfig/20220325-225035-ladsgroup.json
* 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23139 and previous config saved to /var/cache/conftool/dbconfig/20220325-223530-ladsgroup.json
* 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23138 and previous config saved to /var/cache/conftool/dbconfig/20220325-222025-ladsgroup.json
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23137 and previous config saved to /var/cache/conftool/dbconfig/20220325-215400-ladsgroup.json
* 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23136 and previous config saved to /var/cache/conftool/dbconfig/20220325-215346-ladsgroup.json
* 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23135 and previous config saved to /var/cache/conftool/dbconfig/20220325-213841-ladsgroup.json
* 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23134 and previous config saved to /var/cache/conftool/dbconfig/20220325-212336-ladsgroup.json
* 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23133 and previous config saved to /var/cache/conftool/dbconfig/20220325-210831-ladsgroup.json
* 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23132 and previous config saved to /var/cache/conftool/dbconfig/20220325-210136-ladsgroup.json
* 21:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 21:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23131 and previous config saved to /var/cache/conftool/dbconfig/20220325-210128-ladsgroup.json
* 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23130 and previous config saved to /var/cache/conftool/dbconfig/20220325-204623-ladsgroup.json
* 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23129 and previous config saved to /var/cache/conftool/dbconfig/20220325-203118-ladsgroup.json
* 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23128 and previous config saved to /var/cache/conftool/dbconfig/20220325-201613-ladsgroup.json
* 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23127 and previous config saved to /var/cache/conftool/dbconfig/20220325-195137-ladsgroup.json
* 19:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23126 and previous config saved to /var/cache/conftool/dbconfig/20220325-192923-ladsgroup.json
* 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23125 and previous config saved to /var/cache/conftool/dbconfig/20220325-191416-ladsgroup.json
* 19:10 mutante: copying dump from deploy server to dumps server: scp -3 deploy1002.eqiad.wmnet:/srv/miscweb/static-bugzilla.tar.gz labstore1006.wikimedia.org:~ ([[phab:T284193|T284193]])
* 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23124 and previous config saved to /var/cache/conftool/dbconfig/20220325-185911-ladsgroup.json
* 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23123 and previous config saved to /var/cache/conftool/dbconfig/20220325-184406-ladsgroup.json
* 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23122 and previous config saved to /var/cache/conftool/dbconfig/20220325-181439-ladsgroup.json
* 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23121 and previous config saved to /var/cache/conftool/dbconfig/20220325-181431-ladsgroup.json
* 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23120 and previous config saved to /var/cache/conftool/dbconfig/20220325-175926-ladsgroup.json
* 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23119 and previous config saved to /var/cache/conftool/dbconfig/20220325-174421-ladsgroup.json
* 17:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
* 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23118 and previous config saved to /var/cache/conftool/dbconfig/20220325-172916-ladsgroup.json
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23117 and previous config saved to /var/cache/conftool/dbconfig/20220325-170154-ladsgroup.json
* 17:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23116 and previous config saved to /var/cache/conftool/dbconfig/20220325-170146-ladsgroup.json
* 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23115 and previous config saved to /var/cache/conftool/dbconfig/20220325-164641-ladsgroup.json
* 16:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23114 and previous config saved to /var/cache/conftool/dbconfig/20220325-163136-ladsgroup.json
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23112 and previous config saved to /var/cache/conftool/dbconfig/20220325-161631-ladsgroup.json
* 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23111 and previous config saved to /var/cache/conftool/dbconfig/20220325-154705-ladsgroup.json
* 15:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 15:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23110 and previous config saved to /var/cache/conftool/dbconfig/20220325-154658-ladsgroup.json
* 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23109 and previous config saved to /var/cache/conftool/dbconfig/20220325-153152-ladsgroup.json
* 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23108 and previous config saved to /var/cache/conftool/dbconfig/20220325-151647-ladsgroup.json
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23107 and previous config saved to /var/cache/conftool/dbconfig/20220325-150141-ladsgroup.json
* 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23101 and previous config saved to /var/cache/conftool/dbconfig/20220325-143545-ladsgroup.json
* 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 14:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23100 and previous config saved to /var/cache/conftool/dbconfig/20220325-141301-ladsgroup.json
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23099 and previous config saved to /var/cache/conftool/dbconfig/20220325-140850-root.json
* 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23098 and previous config saved to /var/cache/conftool/dbconfig/20220325-135756-ladsgroup.json
* 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23097 and previous config saved to /var/cache/conftool/dbconfig/20220325-135346-root.json
* 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23096 and previous config saved to /var/cache/conftool/dbconfig/20220325-134251-ladsgroup.json
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23095 and previous config saved to /var/cache/conftool/dbconfig/20220325-133842-root.json
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23094 and previous config saved to /var/cache/conftool/dbconfig/20220325-132746-ladsgroup.json
* 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23093 and previous config saved to /var/cache/conftool/dbconfig/20220325-132338-root.json
* 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23092 and previous config saved to /var/cache/conftool/dbconfig/20220325-130834-root.json
* 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23091 and previous config saved to /var/cache/conftool/dbconfig/20220325-130146-ladsgroup.json
* 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23090 and previous config saved to /var/cache/conftool/dbconfig/20220325-130138-ladsgroup.json
* 12:49 hoo: Updated operations/dumps/dcat on snapshot10(08{{!}}09{{!}}11{{!}}12{{!}}13) from {{Gerrit|d4886f6}} to {{Gerrit|a1f46e4}}
* 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23089 and previous config saved to /var/cache/conftool/dbconfig/20220325-124633-ladsgroup.json
* 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23088 and previous config saved to /var/cache/conftool/dbconfig/20220325-123128-ladsgroup.json
* 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23086 and previous config saved to /var/cache/conftool/dbconfig/20220325-121623-ladsgroup.json
* 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23085 and previous config saved to /var/cache/conftool/dbconfig/20220325-120708-ladsgroup.json
* 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23084 and previous config saved to /var/cache/conftool/dbconfig/20220325-120701-ladsgroup.json
* 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23083 and previous config saved to /var/cache/conftool/dbconfig/20220325-115156-ladsgroup.json
* 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23082 and previous config saved to /var/cache/conftool/dbconfig/20220325-113651-ladsgroup.json
* 11:24 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23081 and previous config saved to /var/cache/conftool/dbconfig/20220325-112145-ladsgroup.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23080 and previous config saved to /var/cache/conftool/dbconfig/20220325-110217-marostegui.json
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23079 and previous config saved to /var/cache/conftool/dbconfig/20220325-104712-marostegui.json
* 10:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
* 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23078 and previous config saved to /var/cache/conftool/dbconfig/20220325-103310-ladsgroup.json
* 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23077 and previous config saved to /var/cache/conftool/dbconfig/20220325-103207-marostegui.json
* 10:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
* 10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23076 and previous config saved to /var/cache/conftool/dbconfig/20220325-101701-marostegui.json
* 10:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
* 10:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23075 and previous config saved to /var/cache/conftool/dbconfig/20220325-101016-ladsgroup.json
* 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23074 and previous config saved to /var/cache/conftool/dbconfig/20220325-095511-ladsgroup.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23073 and previous config saved to /var/cache/conftool/dbconfig/20220325-094031-marostegui.json
* 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23072 and previous config saved to /var/cache/conftool/dbconfig/20220325-094023-marostegui.json
* 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23071 and previous config saved to /var/cache/conftool/dbconfig/20220325-094006-ladsgroup.json
* 09:27 moritzm: updating libapache2-mod-auth-cas on moscovium/debmonitor1002
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23070 and previous config saved to /var/cache/conftool/dbconfig/20220325-092518-marostegui.json
* 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23069 and previous config saved to /var/cache/conftool/dbconfig/20220325-092500-ladsgroup.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23068 and previous config saved to /var/cache/conftool/dbconfig/20220325-091013-marostegui.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23067 and previous config saved to /var/cache/conftool/dbconfig/20220325-085508-marostegui.json
* 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23066 and previous config saved to /var/cache/conftool/dbconfig/20220325-082446-ladsgroup.json
* 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23065 and previous config saved to /var/cache/conftool/dbconfig/20220325-080403-marostegui.json
* 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23064 and previous config saved to /var/cache/conftool/dbconfig/20220325-080355-marostegui.json
* 08:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 08:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23063 and previous config saved to /var/cache/conftool/dbconfig/20220325-075610-ladsgroup.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23062 and previous config saved to /var/cache/conftool/dbconfig/20220325-074850-marostegui.json
* 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23061 and previous config saved to /var/cache/conftool/dbconfig/20220325-074105-ladsgroup.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23060 and previous config saved to /var/cache/conftool/dbconfig/20220325-073345-marostegui.json
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23059 and previous config saved to /var/cache/conftool/dbconfig/20220325-072559-ladsgroup.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23058 and previous config saved to /var/cache/conftool/dbconfig/20220325-071840-marostegui.json
* 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23057 and previous config saved to /var/cache/conftool/dbconfig/20220325-071054-ladsgroup.json
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23056 and previous config saved to /var/cache/conftool/dbconfig/20220325-064139-ladsgroup.json
* 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 06:31 _joe_: deleting a couple zotero pods with excessive number of restarts
* 06:29 marostegui: dbmaint s4@eqiad [[phab:T300775|T300775]]
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P23055 and previous config saved to /var/cache/conftool/dbconfig/20220325-060723-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23054 and previous config saved to /var/cache/conftool/dbconfig/20220325-054705-marostegui.json
* 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for testing', diff saved to https://phabricator.wikimedia.org/P23053 and previous config saved to /var/cache/conftool/dbconfig/20220325-053037-marostegui.json
* 00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2027.codfw.wmnet with OS buster
 
== 2022-03-24 ==
* 23:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS buster
* 23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED
* 22:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23050 and previous config saved to /var/cache/conftool/dbconfig/20220324-223031-marostegui.json
* 22:19 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23049 and previous config saved to /var/cache/conftool/dbconfig/20220324-221526-marostegui.json
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED
* 22:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
* 22:07 ebernhardson: restart wcqs-blazegraph on wcqs2001 to resolve intermittant BlazegraphFreeAllocatorsDecreasingRapidly
* 22:06 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
* 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23048 and previous config saved to /var/cache/conftool/dbconfig/20220324-220021-marostegui.json
* 21:54 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23047 and previous config saved to /var/cache/conftool/dbconfig/20220324-214515-marostegui.json
* 21:42 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 21:13 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 inflatador: bking@cumin1001 restarting blazegraph on wdqs[1003-1013].eqiad.wmnet for [[phab:T293862|T293862]]
* 21:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43385320f417052d8e60791b3cb970e6e3f088d5}}: fawiki: Set celebration logo for new vector ([[phab:T304314|T304314]]; 2/2) (duration: 00m 53s)
* 21:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-fawiki-new-year.png: {{Gerrit|43385320f417052d8e60791b3cb970e6e3f088d5}}: fawiki: Set celebration logo for new vector ([[phab:T304314|T304314]]; 1/2) (duration: 00m 50s)
* 21:07 thcipriani@deploy1002: Finished deploy [releng/phatality@15f8ec0]: Deploying phatality updates for opensearch 1.2.0 (duration: 00m 13s)
* 21:07 thcipriani@deploy1002: Started deploy [releng/phatality@15f8ec0]: Deploying phatality updates for opensearch 1.2.0
* 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:03 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 50s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:773607{{!}}Start writing to $wmgAllServices the same value as to $wmfAllServices (T45956)]] (duration: 01m 17s)
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:768255{{!}}Stop writing to certain $wmf* global variables (T45956)]] (part 3) (duration: 00m 55s)
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:29 thcipriani@deploy1002: Synchronized docroot/noc/db.php: Config: [[gerrit:768255{{!}}Stop writing to certain $wmf* global variables (T45956)]] (part II) (duration: 00m 51s)
* 20:28 thcipriani@deploy1002: Synchronized tests: Config: [[gerrit:768255{{!}}Stop writing to certain $wmf* global variables (T45956)]] (part I) (duration: 00m 50s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:23 thcipriani@deploy1002: Synchronized portals: Config: [[gerrit:773380{{!}}Bumping portals to master (T282012)]] (duration: 00m 52s)
* 20:22 thcipriani@deploy1002: Synchronized portals/wikipedia.org/assets: Config: [[gerrit:773380{{!}}Bumping portals to master (T282012)]] (duration: 00m 52s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 20:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23045 and previous config saved to /var/cache/conftool/dbconfig/20220324-201305-marostegui.json
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 20:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302658|T302658]])', diff saved to https://phabricator.wikimedia.org/P23044 and previous config saved to /var/cache/conftool/dbconfig/20220324-201257-marostegui.json
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 20:08 thcipriani@deploy1002: Synchronized wmf-config/
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services


== 2022-03-23 ==
== 2022-09-22 ==
* 23:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 23:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bullseye
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bullseye
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bullseye
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:23 dancy@deploy1002: backport aborted:  (duration: 00m 05s)
* 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:38 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3 refs [[phab:T300203|T300203]]
* 20:55 brennen: end of utc late backport & config window
* 23:34 brennen: trainsperiment ([[phab:T300203|T300203]]): reverting to 1.39.0-wmf.3 on all wikis for [[phab:T304564|T304564]]; will move forward again after a fix.
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:25 cwhite: remove openjdk-8-jre from codfw logstash nodes [[phab:T301770|T301770]]
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 23:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bullseye
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 22:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 22:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 22:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bullseye
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 22:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bullseye
* 20:36 brennen@deploy1002: backport aborted:  (duration: 02m 16s)
* 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bullseye
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bullseye
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bullseye
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:42 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 21:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:31 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:773331{{!}}Enable split A/B testing on beta cluster (T301584)]] (duration: 00m 50s)
* 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bullseye
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:772408{{!}}Allow autoconfirmed users to view basic IP information (T303858)]] and [[gerrit:767216{{!}}Enable IPInfo on testwiki (T260598)]] (duration: 00m 50s)
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bullseye
* 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bullseye
* 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bullseye
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:40 catrope@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:771448{{!}}DynamicSidebar: remove unused extension (T304006)]] (duration: 00m 49s)
* 20:34 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771447{{!}}DynamicSidebar: remove from InitialiseSettings]] (duration: 00m 51s)
* 20:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
* 20:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
* 20:32 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bullseye
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:18 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bullseye
* 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bullseye
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:771444{{!}}DynamicSidebar: remove from CommonSettings (T304006)]] (duration: 00m 50s)
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771443{{!}}wikitech: Remove DynamicSidebar (T304006)]] (duration: 00m 52s)
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 20:01 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 18:38 dancy@deploy1002: Started scap: testing
* 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 19:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 19:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 19:37 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.4 on all wikis; logs seem clean - end of train deployment activities for the week, unless bugs emerge
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:23 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bullseye
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bullseye
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:10 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - [[phab:T301956|T301956]]
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:39 dancy@deploy1002: Sync cancelled.
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
 
== 2022-09-21 ==
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:46 tgr_: UTC late deploys done
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]] (duration: 00m 52s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:59 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 18:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 18:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 18:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 18:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:47 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.4 on testwikis; proceeding to groups 0-2 with 15 minute intervals for watching logs
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 18:46 brennen@deploy1002: Pruned MediaWiki: 1.38.0-wmf.26 (duration: 02m 05s)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 18:42 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]] (duration: 49m 41s)
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bullseye
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bullseye
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:52 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.4  refs [[phab:T300203|T300203]]
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 17:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bullseye
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:48 brennen: trainsperiment ([[phab:T300203|T300203]]): starting prep for 1.39.0-wmf.4
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bullseye
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS bullseye
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 17:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bullseye
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bullseye
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:58 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 16:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS bullseye
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 16:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1011.eqiad.wmnet with OS bullseye
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bullseye
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul
* 16:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1011.eqiad.wmnet with OS bullseye
 
* 16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
== 2022-09-20 ==
* 16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
* 20:19 cjming: end of UTC late backport window
* 15:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bullseye
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:39 urbanecm: foreachwikiindblist wikipedia extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # [[phab:T304052|T304052]]
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 15:38 urbanecm: Created shnwikivoyage and guwwiki
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:31 mmandere: pool cp1080 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1080.eqiad.wmnet with OS buster
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:27 urbanecm@deploy1002: Synchronized langlist: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 04s)
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 15:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 07s)
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 15:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 05s)
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 15:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bullseye
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 15:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 06s)
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 15:23 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating guwwiki ([[phab:T303727|T303727]])
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 15:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 15:21 urbanecm@deploy1002: Synchronized dblists: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 10s)
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:50 jynus: restart db2100:s7 to apply new config
* 15:19 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating guwwiki ([[phab:T303727|T303727]]) (duration: 01m 05s)
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 15:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 15:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 15:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 15:12 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shnwikivoyage ([[phab:T302797|T302797]])
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 15:09 urbanecm@deploy1002: Synchronized dblists: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:45 cstone: payments-wiki upgraded from {{Gerrit|de4b2bb9}} to {{Gerrit|0456850e}}
* 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:08 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating shnwikivoyage ([[phab:T302797|T302797]]) (duration: 01m 05s)
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:01 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2 refs [[phab:T314191|T314191]]
* 15:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1030.eqiad.wmnet with OS bullseye
* 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
* 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 14:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 14:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 14:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 14:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bullseye
* 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
* 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 14:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
* 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 14:44 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS buster
* 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 14:41 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.3/extensions/WikimediaMaintenance/addWiki.php: {{Gerrit|9a0aed0}}: addWiki: Create GrowthExperiment tables for all new Wikipedias ([[phab:T304052|T304052]]) (duration: 01m 06s)
* 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 14:38 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1085.eqiad.wmnet
* 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 14:37 mmandere: depool cp1080 for reimage - [[phab:T290005|T290005]]
* 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 14:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1030.eqiad.wmnet with OS bullseye
* 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:42 dancy@deploy1002: Sync cancelled.
* 14:28 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 14:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 16:41 dancy@deploy1002: Started scap: testing, disregard
* 14:23 bblack: reboot cp1085 (downtimed)
* 16:09 awight@deploy1002: backport aborted:  (duration: 00m 33s)
* 14:20 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (take 2) (duration: 03m 42s)
* 14:19 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wcqs1002.eqiad.wmnet
* 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (duration: 03m 53s)
* 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1029.eqiad.wmnet with OS bullseye
* 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:11 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1027.eqiad.wmnet with OS bullseye
* 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 mmandere: pool cp1082 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: {{Gerrit|1ac09d4709c645558f644a885fadc49c05cc04b9}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 39s)
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: {{Gerrit|1a27e05a7ca53a063d5f9e284d6a09546ac8691c}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 52s)
* 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1082.eqiad.wmnet with OS buster
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
* 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0b55db6f80df5f4c89f969332a6b31077a7172c4}}: Enable Tech Wishes survey on dewiki ([[phab:T316676|T316676]]) (duration: 04m 12s)
* 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1010.eqiad.wmnet with OS bullseye
* 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 13:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
* 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
* 13:48 Lucas_WMDE: UTC afternoon backport window done
* 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
* 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:35 hashar: Restarted CI Jenkins for plugin update
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:773209{{!}}Enable Wikibase REST API on beta wikidata (T302959)]] (2/2, production no-op) (duration: 01m 05s)
* 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:773209{{!}}Enable Wikibase REST API on beta wikidata (T302959)]] (1/2, production no-op) (duration: 01m 07s)
* 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832993{{!}}testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289)]] (duration: 03m 46s)
* 13:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1029.eqiad.wmnet with OS bullseye
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23010 and previous config saved to /var/cache/conftool/dbconfig/20220323-134153-marostegui.json
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:10 kart_: Updated cxserver to 2022-09-15-113346-production ([[phab:T317289|T317289]], [[phab:T315209|T315209]])
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23009 and previous config saved to /var/cache/conftool/dbconfig/20220323-134140-marostegui.json
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:768090{{!}}Write "unexpectedUnconnectedPage" page prop on Test Wikidata clients]] (duration: 01m 10s)
* 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:38 moritzm: restarting superset for OpenSSL update
* 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 13:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1027.eqiad.wmnet with OS bullseye
* 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 13:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
* 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 13:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23008 and previous config saved to /var/cache/conftool/dbconfig/20220323-132635-marostegui.json
* 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1010.eqiad.wmnet with OS bullseye
* 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:16 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1082.eqiad.wmnet with OS buster
* 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23005 and previous config saved to /var/cache/conftool/dbconfig/20220323-131130-marostegui.json
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:07 mmandere: depool cp1082 for reimage - [[phab:T290005|T290005]]
* 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:58 moritzm: installing bind security updates
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23004 and previous config saved to /var/cache/conftool/dbconfig/20220323-125625-marostegui.json
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
* 12:29 moritzm: restarting Turnilo for OpenSSL update
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 36m 08s)
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P23003 and previous config saved to /var/cache/conftool/dbconfig/20220323-120749-marostegui.json
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:34 jbond: upload new puppetboard_3.1.0-1+deb11u1_all.deb
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:33 moritzm: installing apache security updates on stretch
* 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:00 mmandere: pool cp1081 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:58 moritzm: restarting apache on matomo1002/piwik.wikimedia.org
* 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:52 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1081.eqiad.wmnet with OS buster
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 10:30 moritzm: restarting ntpd
* 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 10:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:24 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P23002 and previous config saved to /var/cache/conftool/dbconfig/20220323-101816-marostegui.json
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:07 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1081.eqiad.wmnet with OS buster
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:56 mmandere: depool cp1081 for reimage - [[phab:T290005|T290005]]
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:43 mmandere: pool cp1079 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 09:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS buster
* 09:24 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:17 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:15 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 09:11 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 09:06 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:54 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
* 08:54 moritzm: restarting spamassassin/clamav on otrs1001/ticket.wikimedia.org
* 08:51 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1079.eqiad.wmnet with OS buster
* 08:47 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
* 08:43 moritzm: installing openssl security updates
* 08:36 mmandere: depool cp1079 for reimage - [[phab:T290005|T290005]]
* 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1009.eqiad.wmnet with OS bullseye
* 08:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
* 08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
* 07:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1009.eqiad.wmnet with OS bullseye
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P23001 and previous config saved to /var/cache/conftool/dbconfig/20220323-074408-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P23000 and previous config saved to /var/cache/conftool/dbconfig/20220323-072904-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22999 and previous config saved to /var/cache/conftool/dbconfig/20220323-071400-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22998 and previous config saved to /var/cache/conftool/dbconfig/20220323-065856-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22997 and previous config saved to /var/cache/conftool/dbconfig/20220323-064353-root.json
* 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1112.eqiad.wmnet with OS bullseye
* 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1112.eqiad.wmnet with OS bullseye
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for reimage', diff saved to https://phabricator.wikimedia.org/P22996 and previous config saved to /var/cache/conftool/dbconfig/20220323-060533-marostegui.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 with low weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22995 and previous config saved to /var/cache/conftool/dbconfig/20220323-060351-marostegui.json
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:20 ejegg: updated payments-wiki from {{Gerrit|3048f0aa}} to {{Gerrit|28e24856}}
* 00:11 cjming: end running skin preference update script [[phab:T299104|T299104]]


== 2022-03-22 ==
== 2022-09-19 ==
* 23:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 22:59 ebernhardson: [[phab:T317200|T317200]] start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic
* 23:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
* 21:21 maryum: Deployed security patch for [[phab:T302479|T302479]]
* 23:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
* 21:21 mstyles@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s)
* 23:23 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:15 sbassett: Deployed security patch for [[phab:T312820|T312820]]
* 23:11 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:27 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 20:59 cjming: end of UTC late backport window
* 22:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 20:59 ebernhardson@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: [[gerrit:833031{{!}}Add token_count subfield to outgoing_link (T317546)]] (duration: 03m 51s)
* 22:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:20 ryankemper: [[phab:T301511|T301511]] Mutated cirrus codfw cluster settings to what [I think] they should be, see https://phabricator.wikimedia.org/T301511#7798415; forcing re-check
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22993 and previous config saved to /var/cache/conftool/dbconfig/20220322-221503-marostegui.json
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 20:21 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820459{{!}}Wikifunctions: Drop two config items moved to docker]] (duration: 03m 38s)
* 22:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 20:21 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 22:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22992 and previous config saved to /var/cache/conftool/dbconfig/20220322-221455-marostegui.json
* 22:09 ryankemper: [[phab:T301511|T301511]] Forcing recheck of codfw cirrus setting check
* 22:04 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
* 22:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
* 21:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22991 and previous config saved to /var/cache/conftool/dbconfig/20220322-215950-marostegui.json
* 21:59 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 21:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
* 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
* 21:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 21:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22990 and previous config saved to /var/cache/conftool/dbconfig/20220322-214445-marostegui.json
* 21:39 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 21:35 ryankemper: [[phab:T301511|T301511]] Fixed elastic* eqiad cross-cluster search settings (see https://phabricator.wikimedia.org/T301511#7798267) to resolve the `ElasticSearch setting check` alerts in eqiad
* 21:33 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22989 and previous config saved to /var/cache/conftool/dbconfig/20220322-212939-marostegui.json
* 21:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:18 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:05 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:37 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 urbanecm: UTC late backport window done
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce18d4eeb255349e27163d5e5472fbe21c320322}}: testwiki: enable testing of topics match mode for GLAM events ([[phab:T301825|T301825]]) (duration: 01m 06s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|17caf0359b99b69c0b3e0d7a5fa2f5c7fb7464ef}}: Enable EventGate logging for WikipediaPortal schema ([[phab:T271163|T271163]]) (duration: 01m 54s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22986 and previous config saved to /var/cache/conftool/dbconfig/20220322-191049-marostegui.json
* 20:16 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:829877{{!}}ExtensionDistributor: Add REL1_39 (T313925)]] (duration: 03m 38s)
* 19:04 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 20:12 cjming@deploy1002: Finished scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] (duration: 06m 31s)
* 19:02 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22985 and previous config saved to /var/cache/conftool/dbconfig/20220322-185542-marostegui.json
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22984 and previous config saved to /var/cache/conftool/dbconfig/20220322-184037-marostegui.json
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:30 razzi: remove old karapace1001 known hosts following reimage: `razzi@puppetmaster1001:~$ ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "karapace1001.eqiad.wmnet"`
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22982 and previous config saved to /var/cache/conftool/dbconfig/20220322-182531-marostegui.json
* 20:06 cjming@deploy1002: cjming and arlolra: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 18:01 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@c4d0736]: (no justification provided) (duration: 05m 16s)
* 20:06 cjming@deploy1002: Started scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]]
* 17:55 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@c4d0736]: (no justification provided)
* 19:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 17:50 dcausse@deploy1002: Started scap: (no justification provided)
* 19:33 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 17:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1004.eqiad.wmnet with OS bullseye
* 19:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22981 and previous config saved to /var/cache/conftool/dbconfig/20220322-173301-marostegui.json
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 17:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:30 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22980 and previous config saved to /var/cache/conftool/dbconfig/20220322-173253-marostegui.json
* 17:43 dancy@deploy1002: Installation of scap version "4.21.0" completed for 561 hosts
* 17:25 brennen: trainsperiment ([[phab:T300203|T300203]]): with 1.39.0-wmf.3 on all wikis, we're paused for a planned catchup window - nothing to do at the moment, we'll deploy 1.39.0-wmf.4 tomorrow (2022-03-23).
* 17:42 dancy@deploy1002: Installing scap version "4.21.0" for 561 hosts
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:36 dancy@deploy1002: Sync cancelled.
* 17:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:36 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:36 dancy@deploy1002: Started scap: testing, disregard
* 17:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:03 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ukwikivoyage<nowiki>{</nowiki>.png,-1.5x.png,-2x.png<nowiki>}</nowiki> ([[phab:T317718|T317718]])
* 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22979 and previous config saved to /var/cache/conftool/dbconfig/20220322-171748-marostegui.json
* 14:02 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|6c7151d969b6997bd9cce042b7bc78c282dd9b26}}: Regenerate ukwikivoyage logo ([[phab:T317718|T317718]]) (duration: 03m 46s)
* 17:15 taavi: deploy security patch for [[phab:T304354|T304354]]
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1004.eqiad.wmnet with reason: host reimage
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1004.eqiad.wmnet with reason: host reimage
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22978 and previous config saved to /var/cache/conftool/dbconfig/20220322-170243-marostegui.json
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22974 and previous config saved to /var/cache/conftool/dbconfig/20220322-164738-marostegui.json
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1004.eqiad.wmnet with OS bullseye
* 13:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cbf161d148228e0e706813f923ab1a5d4b42757a}}: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro ([[phab:T314518|T314518]]) (duration: 04m 01s)
* 16:35 ebernhardson: [[phab:T303548|T303548]] start wikidatawiki reindexing on eqiad codfw and cloudelastic cirrus clusters
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1003.eqiad.wmnet with OS bullseye
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22973 and previous config saved to /var/cache/conftool/dbconfig/20220322-162917-marostegui.json
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22972 and previous config saved to /var/cache/conftool/dbconfig/20220322-162904-marostegui.json
* 07:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a6c1ddf5cd1a46ab05f5d6fda4b938a3ee37238}}: Remove unnecessary wgNamespaceAliases from bnwiki ([[phab:T318003|T318003]]) (duration: 04m 16s)
* 16:27 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:27 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
* 16:17 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1003.eqiad.wmnet with OS bullseye
* 16:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 16:16 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22971 and previous config saved to /var/cache/conftool/dbconfig/20220322-161359-marostegui.json
* 16:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 16:13 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:11 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:09 btullis@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
* 16:07 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudnet1003.eqiad.wmnet with OS bullseye
* 16:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
* 15:59 moritzm: imported jvmquake 1.0.1 for stretch/buster (JDK8) and bullseye (JDK11)
* 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22970 and previous config saved to /var/cache/conftool/dbconfig/20220322-155854-marostegui.json
* 15:56 btullis@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1003.eqiad.wmnet with OS bullseye
* 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22969 and previous config saved to /var/cache/conftool/dbconfig/20220322-154349-marostegui.json
* 15:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
* 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22968 and previous config saved to /var/cache/conftool/dbconfig/20220322-152508-marostegui.json
* 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 15:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22967 and previous config saved to /var/cache/conftool/dbconfig/20220322-152247-root.json
* 15:17 hashar: Gerrit 3.3.10 up and running [[phab:T304226|T304226]]
* 15:14 hashar: Stopping Gerrit for security update [[phab:T304226|T304226]]
* 15:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit1001 [[phab:T304226|T304226]] (duration: 00m 10s)
* 15:13 hashar@deploy1002: Started deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit1001 [[phab:T304226|T304226]]
* 15:10 hashar: Upgrading and starting Gerrit on gerrit2001 (replica)
* 15:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
* 15:06 hashar@deploy1002: Finished deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit2001 [[phab:T304226|T304226]] (duration: 00m 12s)
* 15:06 hashar@deploy1002: Started deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit2001 [[phab:T304226|T304226]]
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22965 and previous config saved to /var/cache/conftool/dbconfig/20220322-144855-marostegui.json
* 14:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 14:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22964 and previous config saved to /var/cache/conftool/dbconfig/20220322-144847-marostegui.json
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22963 and previous config saved to /var/cache/conftool/dbconfig/20220322-143341-marostegui.json
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22962 and previous config saved to /var/cache/conftool/dbconfig/20220322-141836-marostegui.json
* 13:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
* 13:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 13:44 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1004.eqiad.wmnet
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22960 and previous config saved to /var/cache/conftool/dbconfig/20220322-134148-marostegui.json
* 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:40 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 13:40 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 13:36 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1004.eqiad.wmnet
* 13:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1003.eqiad.wmnet
* 13:33 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1003.eqiad.wmnet
* 13:27 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]] (duration: 00m 52s)
* 13:26 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 13:26 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
* 13:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
* 13:20 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
* 13:19 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudgw2002-dev.codfw.wmnet
* 13:19 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
* 12:54 moritzm: installing 5.10.103 kernels on servers running a kernel from buster backports [[phab:T303179|T303179]]
* 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 12:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 12:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22959 and previous config saved to /var/cache/conftool/dbconfig/20220322-124117-root.json
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22958 and previous config saved to /var/cache/conftool/dbconfig/20220322-124109-root.json
* 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P22957 and previous config saved to /var/cache/conftool/dbconfig/20220322-123056-marostegui.json
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22956 and previous config saved to /var/cache/conftool/dbconfig/20220322-122613-root.json
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22955 and previous config saved to /var/cache/conftool/dbconfig/20220322-122605-root.json
* 12:24 marostegui: dbmaint s3@eqiad [[phab:T300600|T300600]]
* 12:24 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:772817{{!}}Enable WRITE BOTH on rest of s6 for templatelinks normalization (T299421)]] (duration: 00m 54s)
* 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:21 marostegui: dbmaint s7@eqiad [[phab:T300992|T300992]]
* 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:18 marostegui: dbmaint s6@eqiad [[phab:T300992|T300992]]
* 12:17 marostegui: dbmaint s5@eqiad [[phab:T300992|T300992]]
* 12:16 marostegui: dbmaint s8@eqiad [[phab:T300992|T300992]]
* 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:12 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:772816{{!}}Enable WRITE BOTH for templatelinks normalization in wikitech (T299421)]] (duration: 01m 41s)
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22954 and previous config saved to /var/cache/conftool/dbconfig/20220322-121110-root.json
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22953 and previous config saved to /var/cache/conftool/dbconfig/20220322-121101-root.json
* 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22952 and previous config saved to /var/cache/conftool/dbconfig/20220322-120123-marostegui.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22951 and previous config saved to /var/cache/conftool/dbconfig/20220322-115606-root.json
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22950 and previous config saved to /var/cache/conftool/dbconfig/20220322-115557-root.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22949 and previous config saved to /var/cache/conftool/dbconfig/20220322-114618-marostegui.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22948 and previous config saved to /var/cache/conftool/dbconfig/20220322-114102-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22946 and previous config saved to /var/cache/conftool/dbconfig/20220322-114051-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22945 and previous config saved to /var/cache/conftool/dbconfig/20220322-113113-marostegui.json
* 11:31 marostegui: Reboot db1100 and db1123 for kernel upgrade before master swap
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 for reboot', diff saved to https://phabricator.wikimedia.org/P22944 and previous config saved to /var/cache/conftool/dbconfig/20220322-113003-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 for reboot', diff saved to https://phabricator.wikimedia.org/P22943 and previous config saved to /var/cache/conftool/dbconfig/20220322-112931-marostegui.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22942 and previous config saved to /var/cache/conftool/dbconfig/20220322-111607-marostegui.json
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:46 mmandere: pool cp1077 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1077.eqiad.wmnet with OS buster
* 10:26 _joe_: running check-restart-php on api appservers
* 10:22 _joe_: running check-and-restart on mw-eqiad-appservers
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22940 and previous config saved to /var/cache/conftool/dbconfig/20220322-101354-marostegui.json
* 10:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22939 and previous config saved to /var/cache/conftool/dbconfig/20220322-101346-marostegui.json
* 10:03 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22938 and previous config saved to /var/cache/conftool/dbconfig/20220322-095841-marostegui.json
* 09:54 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
* 09:54 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]] (duration: 62m 07s)
* 09:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
* 09:46 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cloudcontrol1005.wikimedia.org with reason: dcaro testing backups
* 09:46 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cloudcontrol1005.wikimedia.org with reason: dcaro testing backups
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22937 and previous config saved to /var/cache/conftool/dbconfig/20220322-094335-marostegui.json
* 09:34 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1077.eqiad.wmnet with OS buster
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22936 and previous config saved to /var/cache/conftool/dbconfig/20220322-092830-marostegui.json
* 09:25 mmandere: depool cp1077 for reimage - [[phab:T290005|T290005]]
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P22935 and previous config saved to /var/cache/conftool/dbconfig/20220322-091718-root.json
* 09:11 dcausse: restarted blazegraph on wdqs2002 (deadlocked)
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P22934 and previous config saved to /var/cache/conftool/dbconfig/20220322-090214-root.json
* 08:59 XioNoX: drmrs propagate LVS med to core routers
* 08:52 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.3  refs [[phab:T300203|T300203]]
* 08:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1008.eqiad.wmnet with OS bullseye
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22933 and previous config saved to /var/cache/conftool/dbconfig/20220322-084710-root.json
* 08:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
* 08:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22932 and previous config saved to /var/cache/conftool/dbconfig/20220322-083206-root.json
* 08:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1008.eqiad.wmnet with OS bullseye
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22931 and previous config saved to /var/cache/conftool/dbconfig/20220322-081806-marostegui.json
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22930 and previous config saved to /var/cache/conftool/dbconfig/20220322-081758-marostegui.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22929 and previous config saved to /var/cache/conftool/dbconfig/20220322-081702-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22928 and previous config saved to /var/cache/conftool/dbconfig/20220322-080713-marostegui.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22927 and previous config saved to /var/cache/conftool/dbconfig/20220322-080253-marostegui.json
* 07:57 urbanecm: UTC morning backport window completed
* 07:57 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/GrowthExperiments/modules/ext.growthExperiments.MentorDashboard/MenteeOverview/MenteeOverviewPresets.js: {{Gerrit|84877bd}}: MenteeOverviewPresets.getUsersToShow: Fix typo ([[phab:T304353|T304353]]) (duration: 00m 49s)
* 07:53 elukey: restart php-fpm on mw1449 - opcache full after deployment
* 07:49 elukey: restart php-fpm on mw1448 - high cpu usage right after yesterday's deployment at 21 UTC
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22925 and previous config saved to /var/cache/conftool/dbconfig/20220322-074748-marostegui.json
* 07:47 elukey: depool mw1448 manually on the node (high cpu usage from php-fpm)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22924 and previous config saved to /var/cache/conftool/dbconfig/20220322-073243-marostegui.json
* 07:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8151bf2}}: Allow flooders to remove the group from themselves in viwiki ([[phab:T303578|T303578]]) (duration: 00m 50s)
* 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye
* 07:17 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|caad5a4df35c0daa5fd3423e4abf5aa4d5c38a7a}}: wgCrossSiteAJAXdomains: Add foundationwiki and <nowiki>{</nowiki>ee,ge,punjabi<nowiki>}</nowiki>wikimedia ([[phab:T300978|T300978]]) (duration: 00m 49s)
* 07:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b4a9935}}: Create "editautopatrolprotected" protection level for viwiki ([[phab:T303579|T303579]]) (duration: 00m 57s)
* 07:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
* 07:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
* 06:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22923 and previous config saved to /var/cache/conftool/dbconfig/20220322-064230-marostegui.json
* 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22922 and previous config saved to /var/cache/conftool/dbconfig/20220322-064222-marostegui.json
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22921 and previous config saved to /var/cache/conftool/dbconfig/20220322-063223-marostegui.json
* 06:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 06:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22920 and previous config saved to /var/cache/conftool/dbconfig/20220322-062717-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 to s1 with minimal weight [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22919 and previous config saved to /var/cache/conftool/dbconfig/20220322-062310-marostegui.json
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 to dbctl [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P22918 and previous config saved to /var/cache/conftool/dbconfig/20220322-062140-marostegui.json
* 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1175.eqiad.wmnet with OS bullseye
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22917 and previous config saved to /var/cache/conftool/dbconfig/20220322-061212-marostegui.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22916 and previous config saved to /var/cache/conftool/dbconfig/20220322-055707-marostegui.json
* 05:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
* 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
* 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
* 03:47 eileen: civicrm revision changed from {{Gerrit|457adec4}} to {{Gerrit|b6ceb722}}
* 02:56 eileen: civicrm revision changed from {{Gerrit|30c55f51}} to {{Gerrit|457adec4}}
* 02:56 eileen: revision changed from {{Gerrit|30c55f51}} to {{Gerrit|457adec4}}
* 02:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 02:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 01:35 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
* 00:35 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye


== 2022-03-21 ==
== 2022-09-17 ==
* 23:52 eileen: civicrm revision changed from {{Gerrit|52c45874}} to {{Gerrit|30c55f51}}
* 12:17 Emperor: set thanos ring replicas to 3.80 [[phab:T311690|T311690]]
* 22:29 ryankemper: [[phab:T301955|T301955]] Lifted downtime on relforge now that cluster upgrade is complete and cluster is back to green status
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34879 and previous config saved to /var/cache/conftool/dbconfig/20220917-103903-ladsgroup.json
* 22:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34878 and previous config saved to /var/cache/conftool/dbconfig/20220917-102356-ladsgroup.json
* 22:04 reedy@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/OATHAuth/: [[phab:T304350|T304350]] (duration: 00m 49s)
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34877 and previous config saved to /var/cache/conftool/dbconfig/20220917-100850-ladsgroup.json
* 22:03 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/OATHAuth/: [[phab:T304350|T304350]] (duration: 00m 49s)
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34876 and previous config saved to /var/cache/conftool/dbconfig/20220917-095344-ladsgroup.json
* 21:59 ryankemper: [[phab:T301955|T301955]] Downtimed relforge for 2 days; stuck in yellow status during upgrade b/c replica shards cannot be scheduled to a host of lower elasticsearch version than primary shards. Working on patch for our `rolling-operation` cookbook to disable replication during operation
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34875 and previous config saved to /var/cache/conftool/dbconfig/20220917-094856-ladsgroup.json
* 21:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34874 and previous config saved to /var/cache/conftool/dbconfig/20220917-093349-ladsgroup.json
* 21:46 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34873 and previous config saved to /var/cache/conftool/dbconfig/20220917-091843-ladsgroup.json
* 21:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34872 and previous config saved to /var/cache/conftool/dbconfig/20220917-090336-ladsgroup.json
* 21:45 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - [[phab:T301955|T301955]]
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34871 and previous config saved to /var/cache/conftool/dbconfig/20220917-074806-ladsgroup.json
* 21:45 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34870 and previous config saved to /var/cache/conftool/dbconfig/20220917-073300-ladsgroup.json
* 21:45 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
* 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34869 and previous config saved to /var/cache/conftool/dbconfig/20220917-071753-ladsgroup.json
* 21:44 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34868 and previous config saved to /var/cache/conftool/dbconfig/20220917-070247-ladsgroup.json
* 21:44 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
* 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34867 and previous config saved to /var/cache/conftool/dbconfig/20220917-051719-ladsgroup.json
* 21:43 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
* 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 21:43 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 21:41 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34866 and previous config saved to /var/cache/conftool/dbconfig/20220917-051527-ladsgroup.json
* 21:41 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 21:41 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 21:41 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34865 and previous config saved to /var/cache/conftool/dbconfig/20220917-051203-ladsgroup.json
* 21:40 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
* 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 21:36 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
* 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 21:33 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 21:33 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 21:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 21:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 21:30 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 21:30 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 21:28 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 21:28 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 21:27 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 21:27 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 21:26 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 21:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 21:25 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 21:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
* 21:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
* 21:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 21:03 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:56 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]]
* 20:52 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:49 dduvall@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]] (duration: 00m 51s)
* 20:49 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]]
* 20:31 urbanecm: UTC late backport window completed
* 20:29 urbanecm@deploy1002: Synchronized docroot/noc/db.php: {{Gerrit|3bcccdc}}: Migrate away from $wmfDbconfigFromEtcd ([[phab:T45956|T45956]]; 2/2) (duration: 00m 50s)
* 20:29 urbanecm@deploy1002: Synchronized wmf-config/etcd.php: {{Gerrit|3bcccdc}}: Migrate away from $wmfDbconfigFromEtcd ([[phab:T45956|T45956]]; 1/2) (duration: 00m 50s)
* 20:19 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|8347de5}}: ExtensionDistributor: Add REL1_38 ([[phab:T304185|T304185]]) (duration: 00m 51s)
* 19:48 brennen: mw1416: sudo -i /usr/local/sbin/restart-php7.2-fpm
* 19:42 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]]
* 19:26 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]] (duration: 64m 33s)
* 18:54 ebernhardson: [[phab:T303548|T303548]] start commonswiki reindexing on eqiad codfw and cloudelastic cirrus clusters
* 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22906 and previous config saved to /var/cache/conftool/dbconfig/20220321-185042-marostegui.json
* 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22905 and previous config saved to /var/cache/conftool/dbconfig/20220321-183537-marostegui.json
* 18:22 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.2  refs [[phab:T300203|T300203]]
* 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22904 and previous config saved to /var/cache/conftool/dbconfig/20220321-182032-marostegui.json
* 18:19 otto@deploy1002: Finished deploy [analytics/refinery@2175d63]: gobblin prometheus metrics for all jobs - [[phab:T294420|T294420]] (duration: 04m 41s)
* 18:19 brennen: trainsperiment ([[phab:T300203|T300203]]): 1.39.0-wmf.1 on all wikis; starting prep of wmf.2, will abort if needed
* 18:15 otto@deploy1002: Started deploy [analytics/refinery@2175d63]: gobblin prometheus metrics for all jobs - [[phab:T294420|T294420]]
* 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22903 and previous config saved to /var/cache/conftool/dbconfig/20220321-180526-marostegui.json
* 18:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 18:03 otto@deploy1002: Finished deploy [analytics/refinery@2175d63] (hadoop-test): gobblin prometheus metrics for all jobs - [[phab:T294420|T294420]] (duration: 07m 19s)
* 17:59 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1021 for [[phab:T302233|T302233]]
* 17:59 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1020 for [[phab:T302233|T302233]]
* 17:57 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1016 for [[phab:T302233|T302233]]
* 17:55 otto@deploy1002: Started deploy [analytics/refinery@2175d63] (hadoop-test): gobblin prometheus metrics for all jobs - [[phab:T294420|T294420]]
* 17:53 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 17:51 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1017 for [[phab:T302233|T302233]]
* 17:49 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1013 for [[phab:T302233|T302233]]
* 17:49 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1018 for [[phab:T302233|T302233]]
* 17:46 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1014
* 17:41 ryankemper: [WCQS Deploy] Test query passed on commons-query.wikimedia.org; WCQS deploy complete
* 17:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2b67de7] (wcqs): Deploy 0.3.107 to WCQS (duration: 02m 12s)
* 17:38 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.107` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
* 17:37 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2b67de7] (wcqs): Deploy 0.3.107 to WCQS
* 17:35 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1018 after same command without `--table` argument timed out waiting for `zhwiki_p.page`
* 17:32 ryankemper: [Maps] Running puppet agent on rest of `maps*`: `ryankemper@cumin1001:~$ sudo -E cumin -b 4 'maps*' 'run-puppet-agent'`
* 17:31 ryankemper: [Maps] Ran puppet agent on maps master `maps1009` to verify puppet patch works; looks like osm import was disabled as intended `Notice: /Stage[main]/Osm::Imposm3/Systemd::Service[imposm]/Service[imposm]/ensure: ensure changed 'running' to 'stopped'`
* 17:26 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 17:25 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 17:25 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 17:22 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2b67de7]: 0.3.107 (duration: 08m 26s)
* 17:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.107` on canary `wdqs1003`; proceeding to rest of fleet
* 17:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2b67de7]: 0.3.107
* 17:13 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.107`. Pre-deploy tests passing on canary `wdqs1003`
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22902 and previous config saved to /var/cache/conftool/dbconfig/20220321-170731-marostegui.json
* 17:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 16:58 brennen: trainsperiment ([[phab:T300203|T300203]]): blockers currently cleared, will hold wmf.1 -> group2 until 18:00 UTC, per deployment calendar
* 16:55 taavi@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/WikimediaEvents/includes/PageSplitter/PageSplitterHooks.php: Backport: [[gerrit:772364{{!}}PageSplitter: check for OutputPage::getTitle() returning null (T304331)]] (duration: 00m 50s)
* 16:53 taavi@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaEvents/includes/PageSplitter/PageSplitterHooks.php: Backport: [[gerrit:772363{{!}}PageSplitter: check for OutputPage::getTitle() returning null (T304331)]] (duration: 00m 51s)
* 16:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 16:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 16:46 razzi@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
* 16:44 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Wikibase/repo/: Backport: [[gerrit:772361{{!}}Add display to wbsearchentities response even if empty (T104344)]] (duration: 00m 53s)
* 16:15 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 16:14 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 16:13 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 16:13 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 16:12 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 16:12 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
* 16:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
* 16:11 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 16:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 16:11 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 16:11 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 16:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
* 16:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 16:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 16:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
* 16:09 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
* 16:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 16:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22901 and previous config saved to /var/cache/conftool/dbconfig/20220321-160557-marostegui.json
* 16:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
* 16:04 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 16:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 16:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 16:00 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 16:00 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 15:59 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 15:59 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 15:58 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 15:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:57 otto@deploy1002: Finished deploy [analytics/refinery@cd7bf7a] (hadoop-test): fix prometheus pushgateway url - [[phab:T294420|T294420]] (duration: 07m 18s)
* 15:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 15:57 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 15:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 15:57 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
* 15:56 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: [[phab:T304111|T304111]] (duration: 00m 50s)
* 15:56 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
* 15:56 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
* 15:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
* 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
* 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22900 and previous config saved to /var/cache/conftool/dbconfig/20220321-155052-marostegui.json
* 15:50 otto@deploy1002: Started deploy [analytics/refinery@cd7bf7a] (hadoop-test): fix prometheus pushgateway url - [[phab:T294420|T294420]]
* 15:50 otto@deploy1002: Finished deploy [analytics/refinery@33f66db] (hadoop-test): fix prometheus pushgateway url - [[phab:T294420|T294420]] (duration: 00m 03s)
* 15:50 otto@deploy1002: Started deploy [analytics/refinery@33f66db] (hadoop-test): fix prometheus pushgateway url - [[phab:T294420|T294420]]
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22899 and previous config saved to /var/cache/conftool/dbconfig/20220321-154607-marostegui.json
* 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22898 and previous config saved to /var/cache/conftool/dbconfig/20220321-154559-marostegui.json
* 15:44 reedy@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/Flow/maintenance: [[phab:T304318|T304318]] (duration: 00m 49s)
* 15:44 razzi@cumin1001: START - Cookbook sre.wikireplicas.update-views
* 15:43 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Flow/maintenance: [[phab:T304318|T304318]] (duration: 00m 51s)
* 15:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:772421{{!}} Bumping portals to master (T128546)]] (duration: 00m 53s)
* 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:772421{{!}} Bumping portals to master (T128546)]] (duration: 00m 53s)
* 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22897 and previous config saved to /var/cache/conftool/dbconfig/20220321-153547-marostegui.json
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22896 and previous config saved to /var/cache/conftool/dbconfig/20220321-153054-marostegui.json
* 15:27 mmandere: pool cp1075 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:23 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]] (duration: 01m 54s)
* 15:21 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22895 and previous config saved to /var/cache/conftool/dbconfig/20220321-152041-marostegui.json
* 15:19 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1075.eqiad.wmnet with OS buster
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22894 and previous config saved to /var/cache/conftool/dbconfig/20220321-151549-marostegui.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22893 and previous config saved to /var/cache/conftool/dbconfig/20220321-150417-marostegui.json
* 15:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 15:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 15:01 otto@deploy1002: Finished deploy [analytics/refinery@33f66db] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]] (duration: 07m 10s)
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22891 and previous config saved to /var/cache/conftool/dbconfig/20220321-150044-marostegui.json
* 14:58 XioNoX: asw2-b-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 2 member 5 - [[phab:T304316|T304316]]
* 14:54 otto@deploy1002: Started deploy [analytics/refinery@33f66db] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]]
* 14:49 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
* 14:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
* 14:43 oblivian@puppetmaster1001: conftool action : set/enabled=false; selector: name=parameter_q,cluster=cache-text
* 14:35 hashar: Restarting CI Zuul server
* 14:31 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 14:31 hashar: restarting Apache on gerrit2001 and gerrit1001
* 14:30 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1075.eqiad.wmnet with OS buster
* 14:28 hashar@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Echo: Revert "Call IDatabase::timestamp before inserting rows" - [[phab:T304307|T304307]] (duration: 00m 52s)
* 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22890 and previous config saved to /var/cache/conftool/dbconfig/20220321-141922-marostegui.json
* 14:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 14:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 14:07 mmandere: depool cp1075  - [[phab:T290005|T290005]]
* 14:07 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 14:05 mmandere: depool cp1074 -  [[phab:T290005|T290005]]
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22889 and previous config saved to /var/cache/conftool/dbconfig/20220321-140417-marostegui.json
* 14:04 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 14:03 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 14:03 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 14:02 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 14:01 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 13:55 otto@deploy1002: Finished deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]] (duration: 00m 06s)
* 13:55 otto@deploy1002: Started deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]]
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22888 and previous config saved to /var/cache/conftool/dbconfig/20220321-134912-marostegui.json
* 13:34 Lucas_WMDE: UTC afternoon backport window done
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22887 and previous config saved to /var/cache/conftool/dbconfig/20220321-133407-marostegui.json
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771923{{!}}Revert "ptwiki: Disable Growth's image recommendation" (T304095)]] (duration: 00m 49s)
* 13:29 otto@deploy1002: Finished deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]] (duration: 08m 53s)
* 13:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:769748{{!}}Remove unused wgWMESearchRelevancePages config variable]] (duration: 00m 50s)
* 13:20 otto@deploy1002: Started deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - [[phab:T297939|T297939]]
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771998{{!}}Create "pagemover" group at azwiki (T303752)]] (duration: 00m 50s)
* 13:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:771691{{!}}Remove changetags right from users on wikidatawiki and testwikidatawiki (T303682)]] (while keeping applychangetags right) (duration: 00m 49s)
* 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 12:48 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 12:42 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.38.0-wmf.26"
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22885 and previous config saved to /var/cache/conftool/dbconfig/20220321-123055-marostegui.json
* 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22884 and previous config saved to /var/cache/conftool/dbconfig/20220321-123042-marostegui.json
* 12:22 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22883 and previous config saved to /var/cache/conftool/dbconfig/20220321-121537-marostegui.json
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22882 and previous config saved to /var/cache/conftool/dbconfig/20220321-120032-marostegui.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22881 and previous config saved to /var/cache/conftool/dbconfig/20220321-114527-marostegui.json
* 11:41 jnuche@deploy1002: Pruned MediaWiki: 1.38.0-wmf.25 (duration: 01m 32s)
* 11:38 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@cbc85d3] (eqiad): Update kartotherian to {{Gerrit|2ef5c2d}} (duration: 01m 40s)
* 11:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@cbc85d3] (eqiad): Update kartotherian to {{Gerrit|2ef5c2d}}
* 11:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@cbc85d3] (codfw): Update kartotherian to {{Gerrit|2ef5c2d}} (duration: 02m 51s)
* 11:35 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]] (duration: 81m 15s)
* 11:33 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@cbc85d3] (codfw): Update kartotherian to {{Gerrit|2ef5c2d}}
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22880 and previous config saved to /var/cache/conftool/dbconfig/20220321-112217-marostegui.json
* 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22879 and previous config saved to /var/cache/conftool/dbconfig/20220321-112210-marostegui.json
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22878 and previous config saved to /var/cache/conftool/dbconfig/20220321-110705-marostegui.json
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22877 and previous config saved to /var/cache/conftool/dbconfig/20220321-105159-marostegui.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22876 and previous config saved to /var/cache/conftool/dbconfig/20220321-103654-marostegui.json
* 10:13 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.1  refs [[phab:T300203|T300203]]
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22875 and previous config saved to /var/cache/conftool/dbconfig/20220321-094614-marostegui.json
* 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:32 hashar: 1.39.0-wmf.1 train is delayed due to a CI / npm build failure which is being resolved [[phab:T300203|T300203]]
* 09:08 dcausse: restarting blazegraph on wdqs2001 (stuck)
* 09:07 moritzm: restarting FPM
* 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 09:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22874 and previous config saved to /var/cache/conftool/dbconfig/20220321-090250-marostegui.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22873 and previous config saved to /var/cache/conftool/dbconfig/20220321-084745-marostegui.json
* 08:43 hashar: Train blocked due to a npm checksum mismatch preventing CI from merging in the mediawiki/core 1.39.0-wmf.1 change which create the branch. [[phab:T304286|T304286]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22872 and previous config saved to /var/cache/conftool/dbconfig/20220321-083240-marostegui.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22871 and previous config saved to /var/cache/conftool/dbconfig/20220321-083050-root.json
* 08:23 dcausse: restarting blazegraph on wdqs2003 (stuck for 16 hours)
* 08:19 moritzm: installing openssl security updates
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22870 and previous config saved to /var/cache/conftool/dbconfig/20220321-081735-marostegui.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22869 and previous config saved to /var/cache/conftool/dbconfig/20220321-081546-root.json
* 08:09 dcausse: restarting blazegraph on wdqs2004 and wdqs2002 (BlazegraphFreeAllocatorsDecreasingRapidly)
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22868 and previous config saved to /var/cache/conftool/dbconfig/20220321-080042-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22867 and previous config saved to /var/cache/conftool/dbconfig/20220321-074538-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22866 and previous config saved to /var/cache/conftool/dbconfig/20220321-073033-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22865 and previous config saved to /var/cache/conftool/dbconfig/20220321-072902-marostegui.json
* 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22864 and previous config saved to /var/cache/conftool/dbconfig/20220321-072854-marostegui.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22863 and previous config saved to /var/cache/conftool/dbconfig/20220321-071349-marostegui.json
* 07:12 urbanecm: UTC morning B&C done
* 07:08 urbanecm: Create `wikishared.cx_significant_edits` and `wikishared.cx_section_translation` at x1 ([[phab:T302371|T302371]]; `mwscript sql.php --wiki=aawiki --wikidb=wikishared --cluster=extension1 /srv/mediawiki-staging/php-1.38.0-wmf.26/extensions/ContentTranslation/sql/<nowiki>{</nowiki>section-translations,significant-edits<nowiki>}</nowiki>.sql)`)
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22862 and previous config saved to /var/cache/conftool/dbconfig/20220321-065844-marostegui.json
* 06:43 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1175.eqiad.wmnet with OS bullseye
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22861 and previous config saved to /var/cache/conftool/dbconfig/20220321-064339-marostegui.json
* 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
* 06:19 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1175.eqiad.wmnet with OS bullseye
* 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
* 05:52 marostegui: dbmaint s5@eqiad [[phab:T300600|T300600]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 reimage [[phab:T300600|T300600]]', diff saved to https://phabricator.wikimedia.org/P22860 and previous config saved to /var/cache/conftool/dbconfig/20220321-055202-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22859 and previous config saved to /var/cache/conftool/dbconfig/20220321-054838-marostegui.json
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P22858 and previous config saved to /var/cache/conftool/dbconfig/20220321-054358-marostegui.json
* 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance


== 2022-03-20 ==
== 2022-09-16 ==
* 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22857 and previous config saved to /var/cache/conftool/dbconfig/20220320-234358-marostegui.json
* 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34864 and previous config saved to /var/cache/conftool/dbconfig/20220916-212905-ladsgroup.json
* 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22856 and previous config saved to /var/cache/conftool/dbconfig/20220320-234350-marostegui.json
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34863 and previous config saved to /var/cache/conftool/dbconfig/20220916-211358-ladsgroup.json
* 23:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22855 and previous config saved to /var/cache/conftool/dbconfig/20220320-232845-marostegui.json
* 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34862 and previous config saved to /var/cache/conftool/dbconfig/20220916-205852-ladsgroup.json
* 23:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22854 and previous config saved to /var/cache/conftool/dbconfig/20220320-231340-marostegui.json
* 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34861 and previous config saved to /var/cache/conftool/dbconfig/20220916-204345-ladsgroup.json
* 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22853 and previous config saved to /var/cache/conftool/dbconfig/20220320-225835-marostegui.json
* 19:16 mutante: cp1081 /usr/local/sbin/update-ocsp-all
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22850 and previous config saved to /var/cache/conftool/dbconfig/20220320-081713-marostegui.json
* 17:01 mutante: gitlab-runner*: deployed gerrit:832584 and systemctl restart buildkitd on 6 hosts for [[phab:T317904|T317904]]
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 16:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22849 and previous config saved to /var/cache/conftool/dbconfig/20220320-081705-marostegui.json
* 16:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22848 and previous config saved to /var/cache/conftool/dbconfig/20220320-080200-marostegui.json
* 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22847 and previous config saved to /var/cache/conftool/dbconfig/20220320-074655-marostegui.json
* 16:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22846 and previous config saved to /var/cache/conftool/dbconfig/20220320-073150-marostegui.json
* 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184
* 16:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2184
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2183
* 16:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2183
* 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34860 and previous config saved to /var/cache/conftool/dbconfig/20220916-161409-ladsgroup.json
* 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34859 and previous config saved to /var/cache/conftool/dbconfig/20220916-161346-ladsgroup.json
* 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34858 and previous config saved to /var/cache/conftool/dbconfig/20220916-155840-ladsgroup.json
* 15:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:52 dancy@deploy1002: Installation of scap version "4.20.0" completed for 561 hosts
* 15:51 dancy@deploy1002: Installing scap version "4.20.0" for 561 hosts
* 15:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:44 dancy@deploy1002: Finished scap: testing (duration: 04m 53s)
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34857 and previous config saved to /var/cache/conftool/dbconfig/20220916-154333-ladsgroup.json
* 15:39 dancy@deploy1002: Started scap: testing
* 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34856 and previous config saved to /var/cache/conftool/dbconfig/20220916-152827-ladsgroup.json
* 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:01 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:01 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:01 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:42 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:17 godog: add 100G to prometheus/eqiad instance k8s-mlserve
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34855 and previous config saved to /var/cache/conftool/dbconfig/20220916-131902-root.json
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34854 and previous config saved to /var/cache/conftool/dbconfig/20220916-130357-root.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34853 and previous config saved to /var/cache/conftool/dbconfig/20220916-125841-root.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34852 and previous config saved to /var/cache/conftool/dbconfig/20220916-124850-root.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34851 and previous config saved to /var/cache/conftool/dbconfig/20220916-124336-root.json
* 12:43 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34850 and previous config saved to /var/cache/conftool/dbconfig/20220916-123346-root.json
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34849 and previous config saved to /var/cache/conftool/dbconfig/20220916-122831-root.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34848 and previous config saved to /var/cache/conftool/dbconfig/20220916-121841-root.json
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34847 and previous config saved to /var/cache/conftool/dbconfig/20220916-121326-root.json
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34846 and previous config saved to /var/cache/conftool/dbconfig/20220916-120336-root.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34845 and previous config saved to /var/cache/conftool/dbconfig/20220916-115821-root.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34844 and previous config saved to /var/cache/conftool/dbconfig/20220916-114935-root.json
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34843 and previous config saved to /var/cache/conftool/dbconfig/20220916-114831-root.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34842 and previous config saved to /var/cache/conftool/dbconfig/20220916-114316-root.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P34841 and previous config saved to /var/cache/conftool/dbconfig/20220916-113543-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34840 and previous config saved to /var/cache/conftool/dbconfig/20220916-113431-root.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34839 and previous config saved to /var/cache/conftool/dbconfig/20220916-113325-root.json
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P34838 and previous config saved to /var/cache/conftool/dbconfig/20220916-112750-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34837 and previous config saved to /var/cache/conftool/dbconfig/20220916-111925-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34836 and previous config saved to /var/cache/conftool/dbconfig/20220916-110420-root.json
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34835 and previous config saved to /var/cache/conftool/dbconfig/20220916-105819-ladsgroup.json
* 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34834 and previous config saved to /var/cache/conftool/dbconfig/20220916-105809-ladsgroup.json
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34832 and previous config saved to /var/cache/conftool/dbconfig/20220916-104916-root.json
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34831 and previous config saved to /var/cache/conftool/dbconfig/20220916-104303-ladsgroup.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34830 and previous config saved to /var/cache/conftool/dbconfig/20220916-103411-root.json
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34829 and previous config saved to /var/cache/conftool/dbconfig/20220916-102756-ladsgroup.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34828 and previous config saved to /var/cache/conftool/dbconfig/20220916-101905-root.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34827 and previous config saved to /var/cache/conftool/dbconfig/20220916-101250-ladsgroup.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34826 and previous config saved to /var/cache/conftool/dbconfig/20220916-100400-root.json
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34825 and previous config saved to /var/cache/conftool/dbconfig/20220916-093635-root.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34824 and previous config saved to /var/cache/conftool/dbconfig/20220916-093121-root.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34823 and previous config saved to /var/cache/conftool/dbconfig/20220916-092130-root.json
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34822 and previous config saved to /var/cache/conftool/dbconfig/20220916-091616-root.json
* 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34821 and previous config saved to /var/cache/conftool/dbconfig/20220916-091234-ladsgroup.json
* 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34820 and previous config saved to /var/cache/conftool/dbconfig/20220916-090625-root.json
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34819 and previous config saved to /var/cache/conftool/dbconfig/20220916-090111-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34818 and previous config saved to /var/cache/conftool/dbconfig/20220916-085120-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34817 and previous config saved to /var/cache/conftool/dbconfig/20220916-084607-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34816 and previous config saved to /var/cache/conftool/dbconfig/20220916-083615-root.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34815 and previous config saved to /var/cache/conftool/dbconfig/20220916-083102-root.json
* 08:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34814 and previous config saved to /var/cache/conftool/dbconfig/20220916-082110-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34813 and previous config saved to /var/cache/conftool/dbconfig/20220916-081557-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34812 and previous config saved to /var/cache/conftool/dbconfig/20220916-080605-root.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34811 and previous config saved to /var/cache/conftool/dbconfig/20220916-080052-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34810 and previous config saved to /var/cache/conftool/dbconfig/20220916-075100-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34809 and previous config saved to /var/cache/conftool/dbconfig/20220916-074548-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34808 and previous config saved to /var/cache/conftool/dbconfig/20220916-074251-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P34807 and previous config saved to /var/cache/conftool/dbconfig/20220916-072958-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34806 and previous config saved to /var/cache/conftool/dbconfig/20220916-072746-root.json
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34805 and previous config saved to /var/cache/conftool/dbconfig/20220916-071241-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34804 and previous config saved to /var/cache/conftool/dbconfig/20220916-065737-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34803 and previous config saved to /var/cache/conftool/dbconfig/20220916-064232-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34802 and previous config saved to /var/cache/conftool/dbconfig/20220916-062727-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34801 and previous config saved to /var/cache/conftool/dbconfig/20220916-061222-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34800 and previous config saved to /var/cache/conftool/dbconfig/20220916-055717-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34799 and previous config saved to /var/cache/conftool/dbconfig/20220916-055542-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34798 and previous config saved to /var/cache/conftool/dbconfig/20220916-055424-root.json
* 05:51 marostegui: Install 10.6 on db1168 [[phab:T301879|T301879]]
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34797 and previous config saved to /var/cache/conftool/dbconfig/20220916-055031-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P34795 and previous config saved to /var/cache/conftool/dbconfig/20220916-054438-root.json
* 01:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
* 01:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 01:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
* 01:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 00:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 17s)
* 00:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)


== 2022-03-19 ==
== 2022-09-15 ==
* 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22845 and previous config saved to /var/cache/conftool/dbconfig/20220319-171757-marostegui.json
* 23:51 mutante: gerrit1001 - disabled puppet - gerrit:832411
* 17:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 17:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 22:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22844 and previous config saved to /var/cache/conftool/dbconfig/20220319-171749-marostegui.json
* 21:30 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22843 and previous config saved to /var/cache/conftool/dbconfig/20220319-170244-marostegui.json
* 20:25 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] (duration: 07m 06s)
* 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22842 and previous config saved to /var/cache/conftool/dbconfig/20220319-164739-marostegui.json
* 20:18 thcipriani@deploy1002: thcipriani and dani: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22841 and previous config saved to /var/cache/conftool/dbconfig/20220319-163234-marostegui.json
* 20:18 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]]
* 13:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 20:15 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] (duration: 07m 39s)
* 13:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 20:08 thcipriani@deploy1002: thcipriani and dcausse: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 20:07 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]]
* 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 19:26 ebernhardson: pool'd wdqs2001, some blockers before reload can start [[phab:T316236|T316236]]
* 13:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 18:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 13:23 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=piwiki --move-talk --fix # [[phab:T304201|T304201]]
* 18:39 dancy@deploy1002: Finished scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] (duration: 09m 53s)
* 13:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 18:38 cwhite: restart thanos-compact (thanos-fe2001) and swift_ring_manager (thanos-fe1001)
* 04:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:29 dancy@deploy1002: dancy and cscott: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 04:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 18:29 dancy@deploy1002: Started scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]]
* 04:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2003.codfw.wmnet on all recursors
* 03:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2003.codfw.wmnet on all recursors
* 03:51 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 03:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 03:28 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2001.codfw.wmnet on all recursors
* 03:28 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2001.codfw.wmnet on all recursors
* 03:28 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1003.eqiad.wmnet on all recursors
* 03:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1003.eqiad.wmnet on all recursors
* 02:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1002.eqiad.wmnet on all recursors
* 02:27 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1002.eqiad.wmnet on all recursors
* 02:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1001.eqiad.wmnet on all recursors
* 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22839 and previous config saved to /var/cache/conftool/dbconfig/20220319-015847-marostegui.json
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1001.eqiad.wmnet on all recursors
* 01:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 18:15 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 01:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 18:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22838 and previous config saved to /var/cache/conftool/dbconfig/20220319-015839-marostegui.json
* 18:13 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 01:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 18:07 godog: restart envoyproxy on thanos-fe*
* 01:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 18:06 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 01:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22837 and previous config saved to /var/cache/conftool/dbconfig/20220319-014334-marostegui.json
* 18:06 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 01:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22836 and previous config saved to /var/cache/conftool/dbconfig/20220319-012829-marostegui.json
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging BryanDavis out of all services on: 2047 hosts
* 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22835 and previous config saved to /var/cache/conftool/dbconfig/20220319-011324-marostegui.json
* 16:16 andrew@cumin1001: START - Cookbook sre.idm.logout Logging BryanDavis out of all services on: 2047 hosts
* 00:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 15:39 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 15:37 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 15:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 15:22 hnowlan: starting cassandra on sessionstore1001-a
* 15:18 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34792 and previous config saved to /var/cache/conftool/dbconfig/20220915-151131-ladsgroup.json
* 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34791 and previous config saved to /var/cache/conftool/dbconfig/20220915-145625-ladsgroup.json
* 14:41 moritzm: installing libtirpc security updates
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34790 and previous config saved to /var/cache/conftool/dbconfig/20220915-144118-ladsgroup.json
* 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34789 and previous config saved to /var/cache/conftool/dbconfig/20220915-142612-ladsgroup.json
* 14:01 sukhe: retarting bird.service on A:dns-auth for zlib update
* 14:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6b9784a0708cf1e7762034ccfba7e5604b2f6dc2}}: Enable the Vue version of the mentee overview in pilot wikis ([[phab:T300532|T300532]]) (duration: 03m 45s)
* 13:58 aqu@deploy1002: Finished deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d] (duration: 00m 09s)
* 13:58 aqu@deploy1002: Started deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d]
* 13:57 sukhe: retarting haproxy.service on A:dns-auth for zlib update
* 13:57 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d] (duration: 00m 10s)
* 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d]
* 13:51 jayme: updated rsyslog to 8.2208.0-1~bpo11+1 on all kubernetes masters and nodes - [[phab:T289766|T289766]]
* 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383] (duration: 06m 01s)
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:41 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383]
* 13:38 sukhe: restarting bird.service on A:dns-rec for zlib update
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:33 sukhe: restarting pdns-recursor on A:dns-rec for zlib update
* 13:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/GrowthExperiments/: {{Gerrit|f592e85858d17a2de99cde93627054ee4972c2bd}}: Mentee overview: avoid requiring the non-vue mentee overview script when loading the Vue one ([[phab:T300532|T300532]]) (duration: 04m 05s)
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 12:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 12:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 12:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1001.eqiad.wmnet with OS buster
* 12:17 jayme: fleet wide update of prometheus-rsyslog-exporter to 0.0.0+git20201008-4 - [[phab:T289766|T289766]]
* 12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 12:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34787 and previous config saved to /var/cache/conftool/dbconfig/20220915-120013-root.json
* 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 11:50 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 11:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34786 and previous config saved to /var/cache/conftool/dbconfig/20220915-114508-root.json
* 11:44 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 11:43 moritzm: restart exim on lists1001 to pick up zlib security updates
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34785 and previous config saved to /var/cache/conftool/dbconfig/20220915-113003-root.json
* 11:22 jayme: importing prometheus-rsyslog-exporter 0.0.0+git20201008-4 to stretch-wikimedia, buster-wikimedia, bullseye-wikimedia - [[phab:T289766|T289766]]
* 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 11:15 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34784 and previous config saved to /var/cache/conftool/dbconfig/20220915-111458-root.json
* 11:12 hnowlan: sessionstore1001: c-foreach-nt drain
* 11:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 11:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'pool db2129 into s6 API', diff saved to https://phabricator.wikimedia.org/P34783 and previous config saved to /var/cache/conftool/dbconfig/20220915-110453-root.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34782 and previous config saved to /var/cache/conftool/dbconfig/20220915-105953-root.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34781 and previous config saved to /var/cache/conftool/dbconfig/20220915-104448-root.json
* 10:36 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34780 and previous config saved to /var/cache/conftool/dbconfig/20220915-102943-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34779 and previous config saved to /var/cache/conftool/dbconfig/20220915-101438-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34778 and previous config saved to /var/cache/conftool/dbconfig/20220915-101425-root.json
* 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P34777 and previous config saved to /var/cache/conftool/dbconfig/20220915-100212-root.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34775 and previous config saved to /var/cache/conftool/dbconfig/20220915-095920-root.json
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 09:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34774 and previous config saved to /var/cache/conftool/dbconfig/20220915-094415-root.json
* 09:38 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383] (duration: 14m 21s)
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34773 and previous config saved to /var/cache/conftool/dbconfig/20220915-092910-root.json
* 09:23 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383]
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34772 and previous config saved to /var/cache/conftool/dbconfig/20220915-091405-root.json
* 09:13 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383] (duration: 00m 08s)
* 09:13 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383]
* 09:12 aqu@deploy1002: Finished deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383] (duration: 27m 31s)
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34770 and previous config saved to /var/cache/conftool/dbconfig/20220915-085900-root.json
* 08:49 apergos: UTC backport training window closed at lsat
* 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 08:45 aqu@deploy1002: Started deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383]
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34768 and previous config saved to /var/cache/conftool/dbconfig/20220915-084355-root.json
* 08:43 aqu: about to deploy analytics/refinery
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34767 and previous config saved to /var/cache/conftool/dbconfig/20220915-084046-root.json
* 08:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34766 and previous config saved to /var/cache/conftool/dbconfig/20220915-082851-root.json
* 08:26 tsepothoabala@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832339{{!}}Enable action blocks on ptwiki (T317157)]] (duration: 04m 07s)
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34765 and previous config saved to /var/cache/conftool/dbconfig/20220915-082541-root.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34764 and previous config saved to /var/cache/conftool/dbconfig/20220915-082112-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34763 and previous config saved to /var/cache/conftool/dbconfig/20220915-081627-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2114 to s6 codfw master [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34762 and previous config saved to /var/cache/conftool/dbconfig/20220915-081517-marostegui.json
* 08:14 marostegui: Starting s6 codfw failover from db2129 to db2114 - [[phab:T317850|T317850]]
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34761 and previous config saved to /var/cache/conftool/dbconfig/20220915-081036-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34760 and previous config saved to /var/cache/conftool/dbconfig/20220915-080607-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 from API [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34759 and previous config saved to /var/cache/conftool/dbconfig/20220915-080157-root.json
* 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary codfw s6 [[phab:T317850|T317850]]
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2114 with weight 0 [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34758 and previous config saved to /var/cache/conftool/dbconfig/20220915-080122-root.json
* 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary codfw s6 [[phab:T317850|T317850]]
* 07:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34757 and previous config saved to /var/cache/conftool/dbconfig/20220915-075531-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34756 and previous config saved to /var/cache/conftool/dbconfig/20220915-075102-root.json
* 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
* 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
* 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34755 and previous config saved to /var/cache/conftool/dbconfig/20220915-074026-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34754 and previous config saved to /var/cache/conftool/dbconfig/20220915-073557-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34753 and previous config saved to /var/cache/conftool/dbconfig/20220915-072520-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34752 and previous config saved to /var/cache/conftool/dbconfig/20220915-072053-root.json
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: reboot
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2151.codfw.wmnet with reason: reboot
* 07:14 moritzm: installing zlib security updates
* 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34751 and previous config saved to /var/cache/conftool/dbconfig/20220915-071015-root.json
* 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 07:06 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34750 and previous config saved to /var/cache/conftool/dbconfig/20220915-070548-root.json
* 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34749 and previous config saved to /var/cache/conftool/dbconfig/20220915-065510-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34748 and previous config saved to /var/cache/conftool/dbconfig/20220915-065043-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db2096 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34747 and previous config saved to /var/cache/conftool/dbconfig/20220915-064750-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2115 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34746 and previous config saved to /var/cache/conftool/dbconfig/20220915-064635-marostegui.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2096 to x1 primary and set section read-write [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34745 and previous config saved to /var/cache/conftool/dbconfig/20220915-064525-root.json
* 06:44 marostegui: Starting x1 codfw failover from db2115 to db2096 - [[phab:T317842|T317842]]
* 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T317842|T317842]]
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2096 with weight 0 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34744 and previous config saved to /var/cache/conftool/dbconfig/20220915-064014-root.json
* 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T317842|T317842]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34743 and previous config saved to /var/cache/conftool/dbconfig/20220915-063538-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34742 and previous config saved to /var/cache/conftool/dbconfig/20220915-061421-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 codfw [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34741 and previous config saved to /var/cache/conftool/dbconfig/20220915-061317-marostegui.json
* 06:12 marostegui: Starting s3 codfw failover from db2105 to db2127 - [[phab:T317839|T317839]]
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34740 and previous config saved to /var/cache/conftool/dbconfig/20220915-060307-root.json
* 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Codfw switchover s3 [[phab:T317839|T317839]]
* 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Codfw switchover s3 [[phab:T317839|T317839]]
* 05:32 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]


== 2022-03-18 ==
== 2022-09-14 ==
* 21:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34739 and previous config saved to /var/cache/conftool/dbconfig/20220914-220822-ladsgroup.json
* 21:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
* 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
* 21:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
* 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
* 15:38 jayme: powercycle kubernetes1002
* 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34738 and previous config saved to /var/cache/conftool/dbconfig/20220914-220744-ladsgroup.json
* 14:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: [[gerrit:771907{{!}}Don't pass the revision to PO access service (T304127)]] (duration: 00m 49s)
* 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34737 and previous config saved to /var/cache/conftool/dbconfig/20220914-215238-ladsgroup.json
* 14:12 XioNoX: configure NAT for civi1002 - [[phab:T304098|T304098]]
* 21:38 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 01m 48s)
* 14:02 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34736 and previous config saved to /var/cache/conftool/dbconfig/20220914-213732-ladsgroup.json
* 14:02 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 21:37 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
* 14:01 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 21:36 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
* 14:01 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 21:35 dduvall@deploy1002: Installation of scap version "4.19.1" completed for 561 hosts
* 13:59 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 21:35 dduvall@deploy1002: Installing scap version "4.19.1" for 561 hosts
* 13:59 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 21:34 dduvall: Deploying scap 4.19.1 (https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/832297/1/changelog)
* 13:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
* 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34735 and previous config saved to /var/cache/conftool/dbconfig/20220914-212225-ladsgroup.json
* 13:07 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:02 moritzm: imported python3.5  3.5.3-1+deb9u5+wmf1 to component/python35 [[phab:T303801|T303801]]
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:35 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:35 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 20:44 dancy@deploy1002: Sync cancelled.
* 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 20:44 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 20:44 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:29 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 20:44 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:28 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 20:40 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:09 vgutierrez: rolling restart of nginx on ncredir instances to catch up on OpenSSL updates
* 20:40 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:05 vgutierrez: restarting acme-chief and acme-chief API services to catch up on OpenSSL updates
* 20:39 dancy@deploy1002: Started scap: testing
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 20:38 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 05m 49s)
* 10:54 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 20:34 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 20:34 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:52 akosiaris: drain kubernetes200[1-4] [[phab:T303045|T303045]]
* 20:34 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:51 akosiaris: depool kubernetes200[1-4] [[phab:T303045|T303045]]
* 20:33 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2004.codfw.wmnet
* 20:32 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1 refs [[phab:T314190|T314190]]
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2003.codfw.wmnet
* 20:28 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2002.codfw.wmnet
* 20:24 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2001.codfw.wmnet
* 20:24 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:01 akosiaris: drain kubernetes100[1-4] [[phab:T303044|T303044]]
* 20:21 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:54 akosiaris: depool kubernetes100[1-4] from pybal [[phab:T303044|T303044]]
* 20:19 dancy@deploy1002: deploy-promote aborted: (duration: 08m 52s)
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1004.eqiad.wmnet
* 20:19 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 01m 24s)
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1003.eqiad.wmnet
* 20:18 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1002.eqiad.wmnet
* 20:18 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet
* 20:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:42 akosiaris: uncordon kubernetes1018-1022. [[phab:T293728|T293728]]. Nodes are live, ready to receive workloads and traffic.
* 20:13 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:37 akosiaris: pool kubernetes1018-1022 in pybal. [[phab:T293728|T293728]]
* 20:13 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:37 akosiaris: pool kubernetes1018-1022 in pybal.
* 20:12 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1022.eqiad.wmnet
* 20:09 dancy@deploy1002: Sync cancelled.
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1021.eqiad.wmnet
* 20:09 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1020.eqiad.wmnet
* 20:09 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1019.eqiad.wmnet
* 20:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1018.eqiad.wmnet
* 20:02 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22827 and previous config saved to /var/cache/conftool/dbconfig/20220318-093543-marostegui.json
* 20:02 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 20:02 dancy@deploy1002: Started scap: testing
* 09:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 20:01 TheresNoTime: Nothing to deploy in this UTC late backport window
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1022.eqiad.wmnet
* 19:57 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1021.eqiad.wmnet
* 19:57 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1020.eqiad.wmnet
* 19:55 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 12s)
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1019.eqiad.wmnet
* 19:55 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1018.eqiad.wmnet
* 19:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:10 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 19:51 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:08 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 19:49 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22826 and previous config saved to /var/cache/conftool/dbconfig/20220318-085517-marostegui.json
* 19:49 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22825 and previous config saved to /var/cache/conftool/dbconfig/20220318-084012-marostegui.json
* 19:49 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22824 and previous config saved to /var/cache/conftool/dbconfig/20220318-082507-marostegui.json
* 19:48 dancy@deploy1002: Started scap: testing
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22823 and previous config saved to /var/cache/conftool/dbconfig/20220318-081002-marostegui.json
* 19:46 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 23s)
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22822 and previous config saved to /var/cache/conftool/dbconfig/20220318-072852-root.json
* 19:46 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22821 and previous config saved to /var/cache/conftool/dbconfig/20220318-071758-marostegui.json
* 19:39 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 19:39 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 19:39 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22820 and previous config saved to /var/cache/conftool/dbconfig/20220318-071750-marostegui.json
* 19:38 dancy@deploy1002: Started scap: testing
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22819 and previous config saved to /var/cache/conftool/dbconfig/20220318-071348-root.json
* 19:38 dancy@deploy1002: sync-world aborted: testing (duration: 13m 25s)
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22818 and previous config saved to /var/cache/conftool/dbconfig/20220318-070245-marostegui.json
* 19:35 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22817 and previous config saved to /var/cache/conftool/dbconfig/20220318-065844-root.json
* 19:26 dancy: dancy@deploy1002 touch /var/lib/deploy-mwdebug/pause
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22816 and previous config saved to /var/cache/conftool/dbconfig/20220318-064740-marostegui.json
* 19:24 dancy@deploy1002: Started scap: testing
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22815 and previous config saved to /var/cache/conftool/dbconfig/20220318-064340-root.json
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22814 and previous config saved to /var/cache/conftool/dbconfig/20220318-063631-root.json
* 19:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling (duration: 02m 03s)
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22813 and previous config saved to /var/cache/conftool/dbconfig/20220318-063524-root.json
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22812 and previous config saved to /var/cache/conftool/dbconfig/20220318-063235-marostegui.json
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22811 and previous config saved to /var/cache/conftool/dbconfig/20220318-062836-root.json
* 19:15 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22810 and previous config saved to /var/cache/conftool/dbconfig/20220318-062127-root.json
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22809 and previous config saved to /var/cache/conftool/dbconfig/20220318-062020-root.json
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22808 and previous config saved to /var/cache/conftool/dbconfig/20220318-061332-root.json
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1179.eqiad.wmnet with OS bullseye
* 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22807 and previous config saved to /var/cache/conftool/dbconfig/20220318-060623-root.json
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22806 and previous config saved to /var/cache/conftool/dbconfig/20220318-060516-root.json
* 18:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 05:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
* 18:50 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki (duration: 02m 05s)
* 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
* 18:48 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22805 and previous config saved to /var/cache/conftool/dbconfig/20220318-055119-root.json
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22804 and previous config saved to /var/cache/conftool/dbconfig/20220318-055012-root.json
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1179.eqiad.wmnet with OS bullseye
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:39 marostegui: dbmaint on s3@eqiad [[phab:T300600|T300600]]
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 reimage [[phab:T300600|T300600]]', diff saved to https://phabricator.wikimedia.org/P22803 and previous config saved to /var/cache/conftool/dbconfig/20220318-053832-marostegui.json
* 18:36 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 04m 41s)
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146: