You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(TimStarling: restarted php7.2-fpm on the 9 canary hosts in eqiad T313770)
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42379 and previous config saved to /var/cache/conftool/dbconfig/20221206-012539-ladsgroup.json)
 
(123 intermediate revisions by the same user not shown)
Line 1: Line 1:
== 2022-07-26 ==
== 2022-12-06 ==
* 00:11 TimStarling: restarted php7.2-fpm on the 9 canary hosts in eqiad [[phab:T313770|T313770]]
* 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42379 and previous config saved to /var/cache/conftool/dbconfig/20221206-012539-ladsgroup.json
* 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42378 and previous config saved to /var/cache/conftool/dbconfig/20221206-012510-ladsgroup.json
* 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42377 and previous config saved to /var/cache/conftool/dbconfig/20221206-011244-ladsgroup.json
* 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42376 and previous config saved to /var/cache/conftool/dbconfig/20221206-011128-ladsgroup.json
* 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 01:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42375 and previous config saved to /var/cache/conftool/dbconfig/20221206-011033-ladsgroup.json
* 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42374 and previous config saved to /var/cache/conftool/dbconfig/20221206-011003-ladsgroup.json
* 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42373 and previous config saved to /var/cache/conftool/dbconfig/20221206-005737-ladsgroup.json
* 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42372 and previous config saved to /var/cache/conftool/dbconfig/20221206-005526-ladsgroup.json
* 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42371 and previous config saved to /var/cache/conftool/dbconfig/20221206-005457-ladsgroup.json
* 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42370 and previous config saved to /var/cache/conftool/dbconfig/20221206-005401-ladsgroup.json
* 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
* 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
* 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42369 and previous config saved to /var/cache/conftool/dbconfig/20221206-005339-ladsgroup.json
* 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42368 and previous config saved to /var/cache/conftool/dbconfig/20221206-005244-ladsgroup.json
* 00:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
* 00:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
* 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42367 and previous config saved to /var/cache/conftool/dbconfig/20221206-005223-ladsgroup.json
* 00:51 cstone: payments-wiki upgraded from {{Gerrit|b613ddfb}} to {{Gerrit|0cd7e779}}
* 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42366 and previous config saved to /var/cache/conftool/dbconfig/20221206-004231-ladsgroup.json
* 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42365 and previous config saved to /var/cache/conftool/dbconfig/20221206-003833-ladsgroup.json
* 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42364 and previous config saved to /var/cache/conftool/dbconfig/20221206-003716-ladsgroup.json
* 00:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 00:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 00:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42363 and previous config saved to /var/cache/conftool/dbconfig/20221206-002945-ladsgroup.json
* 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42362 and previous config saved to /var/cache/conftool/dbconfig/20221206-002326-ladsgroup.json
* 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42361 and previous config saved to /var/cache/conftool/dbconfig/20221206-002210-ladsgroup.json
* 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42360 and previous config saved to /var/cache/conftool/dbconfig/20221206-001438-ladsgroup.json
* 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42359 and previous config saved to /var/cache/conftool/dbconfig/20221206-000820-ladsgroup.json
* 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42358 and previous config saved to /var/cache/conftool/dbconfig/20221206-000703-ladsgroup.json
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42357 and previous config saved to /var/cache/conftool/dbconfig/20221206-000654-ladsgroup.json
* 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42356 and previous config saved to /var/cache/conftool/dbconfig/20221206-000633-ladsgroup.json
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42355 and previous config saved to /var/cache/conftool/dbconfig/20221206-000444-ladsgroup.json
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T322618|T322618]])', diff saved to https://phabricator.wikimedia.org/P42354 and previous config saved to /var/cache/conftool/dbconfig/20221206-000329-ladsgroup.json


== 2022-07-25 ==
== 2022-12-05 ==
* 22:54 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42353 and previous config saved to /var/cache/conftool/dbconfig/20221205-235932-ladsgroup.json
* 22:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 23:57 tzatziki: removing 2 files for legal compliance
* 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31900 and previous config saved to /var/cache/conftool/dbconfig/20220725-224153-ladsgroup.json
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T323907|T323907]])', diff saved to https://phabricator.wikimedia.org/P42352 and previous config saved to /var/cache/conftool/dbconfig/20221205-235724-ladsgroup.json
* 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31899 and previous config saved to /var/cache/conftool/dbconfig/20220725-222648-ladsgroup.json
* 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 22:11 ladsgroup@
* 23:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 23:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42351 and previous config saved to /var/cache/conftool/dbconfig/20221205-235126-ladsgroup.json
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42350 and previous config saved to /var/cache/conftool/dbconfig/20221205-234822-ladsgroup.json
* 23:47 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1d3ba41]:


== 2022-07-24 ==
== 2022-12-04 ==
* 20:54 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org
* 04:19 TheresNoTime: [[phab:T302486|T302486]] : `[samtar@mwmaint1002 ~]$ mwscript maintenance/fixMergeHistoryCorruption.php --wiki enwiki --dry-run --ns 828`
* 20:37 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org
* 14:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31815 and previous config saved to /var/cache/conftool/dbconfig/20220724-100221-ladsgroup.json
* 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31814 and previous config saved to /var/cache/conftool/dbconfig/20220724-094716-ladsgroup.json
* 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31813 and previous config saved to /var/cache/conftool/dbconfig/20220724-093211-ladsgroup.json
* 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31812 and previous config saved to /var/cache/conftool/dbconfig/20220724-091706-ladsgroup.json
* 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31811 and previous config saved to /var/cache/conftool/dbconfig/20220724-041542-ladsgroup.json
* 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31810 and previous config saved to /var/cache/conftool/dbconfig/20220724-040037-ladsgroup.json
* 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31809 and previous config saved to /var/cache/conftool/dbconfig/20220724-034532-ladsgroup.json
* 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31808 and previous config saved to /var/cache/conftool/dbconfig/20220724-034356-ladsgroup.json
* 03:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 03:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31807 and previous config saved to /var/cache/conftool/dbconfig/20220724-034336-ladsgroup.json
* 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31806 and previous config saved to /var/cache/conftool/dbconfig/20220724-033027-ladsgroup.json
* 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31805 and previous config saved to /var/cache/conftool/dbconfig/20220724-032831-ladsgroup.json
* 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31804 and previous config saved to /var/cache/conftool/dbconfig/20220724-031326-ladsgroup.json
* 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31803 and previous config saved to /var/cache/conftool/dbconfig/20220724-025820-ladsgroup.json
* 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31802 and previous config saved to /var/cache/conftool/dbconfig/20220724-003718-ladsgroup.json
* 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31801 and previous config saved to /var/cache/conftool/dbconfig/20220724-003652-ladsgroup.json
* 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31800 and previous config saved to /var/cache/conftool/dbconfig/20220724-002147-ladsgroup.json
* 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P31799 and previous config saved to /var/cache/conftool/dbconfig/20220724-000641-ladsgroup.json


== 2022-07-23 ==
== 2022-12-03 ==
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31798 and previous config saved to /var/cache/conftool/dbconfig/20220723-235136-ladsgroup.json
* 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - [[phab:T321410|T321410]]
* 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31797 and previous config saved to /var/cache/conftool/dbconfig/20220723-232948-ladsgroup.json
* 23:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 23:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31796 and previous config saved to /var/cache/conftool/dbconfig/20220723-232927-ladsgroup.json
* 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31795 and previous config saved to /var/cache/conftool/dbconfig/20220723-231422-ladsgroup.json
* 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31794 and previous config saved to /var/cache/conftool/dbconfig/20220723-225917-ladsgroup.json
* 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31793 and previous config saved to /var/cache/conftool/dbconfig/20220723-224412-ladsgroup.json
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31792 and previous config saved to /var/cache/conftool/dbconfig/20220723-220740-ladsgroup.json
* 22:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31791 and previous config saved to /var/cache/conftool/dbconfig/20220723-220720-ladsgroup.json
* 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31790 and previous config saved to /var/cache/conftool/dbconfig/20220723-215215-ladsgroup.json
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31789 and previous config saved to /var/cache/conftool/dbconfig/20220723-213710-ladsgroup.json
* 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31788 and previous config saved to /var/cache/conftool/dbconfig/20220723-213610-ladsgroup.json
* 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31787 and previous config saved to /var/cache/conftool/dbconfig/20220723-212204-ladsgroup.json
* 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31786 and previous config saved to /var/cache/conftool/dbconfig/20220723-212105-ladsgroup.json
* 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31785 and previous config saved to /var/cache/conftool/dbconfig/20220723-210559-ladsgroup.json
* 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31784 and previous config saved to /var/cache/conftool/dbconfig/20220723-205054-ladsgroup.json
* 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31783 and previous config saved to /var/cache/conftool/dbconfig/20220723-204049-ladsgroup.json
* 20:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 20:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31782 and previous config saved to /var/cache/conftool/dbconfig/20220723-164105-ladsgroup.json
* 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31781 and previous config saved to /var/cache/conftool/dbconfig/20220723-164045-ladsgroup.json
* 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31780 and previous config saved to /var/cache/conftool/dbconfig/20220723-162540-ladsgroup.json
* 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31779 and previous config saved to /var/cache/conftool/dbconfig/20220723-161035-ladsgroup.json
* 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31778 and previous config saved to /var/cache/conftool/dbconfig/20220723-155530-ladsgroup.json
* 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 15:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31777 and previous config saved to /var/cache/conftool/dbconfig/20220723-155311-ladsgroup.json
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31776 and previous config saved to /var/cache/conftool/dbconfig/20220723-153805-ladsgroup.json
* 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31775 and previous config saved to /var/cache/conftool/dbconfig/20220723-152300-ladsgroup.json
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31774 and previous config saved to /var/cache/conftool/dbconfig/20220723-151951-ladsgroup.json
* 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31773 and previous config saved to /var/cache/conftool/dbconfig/20220723-151930-ladsgroup.json
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31772 and previous config saved to /var/cache/conftool/dbconfig/20220723-150754-ladsgroup.json
* 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31771 and previous config saved to /var/cache/conftool/dbconfig/20220723-150425-ladsgroup.json
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31770 and previous config saved to /var/cache/conftool/dbconfig/20220723-144920-ladsgroup.json
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31769 and previous config saved to /var/cache/conftool/dbconfig/20220723-143414-ladsgroup.json
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31768 and previous config saved to /var/cache/conftool/dbconfig/20220723-105825-ladsgroup.json
* 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31767 and previous config saved to /var/cache/conftool/dbconfig/20220723-105805-ladsgroup.json
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31766 and previous config saved to /var/cache/conftool/dbconfig/20220723-105257-ladsgroup.json
* 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31765 and previous config saved to /var/cache/conftool/dbconfig/20220723-105238-ladsgroup.json
* 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31764 and previous config saved to /var/cache/conftool/dbconfig/20220723-105228-ladsgroup.json
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31763 and previous config saved to /var/cache/conftool/dbconfig/20220723-104300-ladsgroup.json
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31762 and previous config saved to /var/cache/conftool/dbconfig/20220723-103733-ladsgroup.json
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31761 and previous config saved to /var/cache/conftool/dbconfig/20220723-103723-ladsgroup.json
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31760 and previous config saved to /var/cache/conftool/dbconfig/20220723-102755-ladsgroup.json
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31759 and previous config saved to /var/cache/conftool/dbconfig/20220723-102227-ladsgroup.json
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31758 and previous config saved to /var/cache/conftool/dbconfig/20220723-102218-ladsgroup.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31757 and previous config saved to /var/cache/conftool/dbconfig/20220723-101250-ladsgroup.json
* 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31756 and previous config saved to /var/cache/conftool/dbconfig/20220723-100722-ladsgroup.json
* 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31755 and previous config saved to /var/cache/conftool/dbconfig/20220723-100713-ladsgroup.json
* 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31754 and previous config saved to /var/cache/conftool/dbconfig/20220723-095241-ladsgroup.json
* 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31753 and previous config saved to /var/cache/conftool/dbconfig/20220723-053604-ladsgroup.json
* 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31752 and previous config saved to /var/cache/conftool/dbconfig/20220723-052925-ladsgroup.json
* 05:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31751 and previous config saved to /var/cache/conftool/dbconfig/20220723-015300-ladsgroup.json
* 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31750 and previous config saved to /var/cache/conftool/dbconfig/20220723-013755-ladsgroup.json
* 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31749 and previous config saved to /var/cache/conftool/dbconfig/20220723-012250-ladsgroup.json
* 01:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 01:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31748 and previous config saved to /var/cache/conftool/dbconfig/20220723-010745-ladsgroup.json
* 00:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
* 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
* 00:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31747 and previous config saved to /var/cache/conftool/dbconfig/20220723-001125-ladsgroup.json


== 2022-07-22 ==
== 2022-12-02 ==
* 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31746 and previous config saved to /var/cache/conftool/dbconfig/20220722-235619-ladsgroup.json
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31745 and previous config saved to /var/cache/conftool/dbconfig/20220722-234114-ladsgroup.json
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31744 and previous config saved to /var/cache/conftool/dbconfig/20220722-232609-ladsgroup.json
* 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31743 and previous config saved to /var/cache/conftool/dbconfig/20220722-215349-ladsgroup.json
* 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31742 and previous config saved to /var/cache/conftool/dbconfig/20220722-215329-ladsgroup.json
* 19:36 volans: fixed git checkout permissions [[phab:T324334|T324334]]
* 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31741 and previous config saved to /var/cache/conftool/dbconfig/20220722-213824-ladsgroup.json
* 19:11 sukhe: restart pybal on lvs5004
* 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31740 and previous config saved to /var/cache/conftool/dbconfig/20220722-212319-ladsgroup.json
* 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31739 and previous config saved to /var/cache/conftool/dbconfig/20220722-211308-ladsgroup.json
* 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
* 21:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
* 21:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31738 and previous config saved to /var/cache/conftool/dbconfig/20220722-211259-ladsgroup.json
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31737 and previous config saved to /var/cache/conftool/dbconfig/20220722-210813-ladsgroup.json
* 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 21:05 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 29s)
* 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 21:04 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
* 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31736 and previous config saved to /var/cache/conftool/dbconfig/20220722-205754-ladsgroup.json
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 20:44 brennen@deploy1002: Finished deploy [phabricator/deployment@f962d0e]: (no justification provided) (duration: 00m 07s)
* 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 20:44 brennen@deploy1002: Started deploy [phabricator/deployment@f962d0e]: (no justification provided)
* 18:20 sukhe: decomm lvs5001: restarting pybal
* 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31735 and previous config saved to /var/cache/conftool/dbconfig/20220722-204248-ladsgroup.json
* 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
* 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 20:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
* 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31734 and previous config saved to /var/cache/conftool/dbconfig/20220722-203708-ladsgroup.json
* 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - [[phab:T324334|T324334]]
* 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31733 and previous config saved to /var/cache/conftool/dbconfig/20220722-202743-ladsgroup.json
* 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
* 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31732 and previous config saved to /var/cache/conftool/dbconfig/20220722-202203-ladsgroup.json
* 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31731 and previous config saved to /var/cache/conftool/dbconfig/20220722-200658-ladsgroup.json
* 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
* 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31730 and previous config saved to /var/cache/conftool/dbconfig/20220722-195153-ladsgroup.json
* 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31729 and previous config saved to /var/cache/conftool/dbconfig/20220722-194428-ladsgroup.json
* 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
* 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
* 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
* 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 19:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
* 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31727 and previous config saved to /var/cache/conftool/dbconfig/20220722-173218-ladsgroup.json
* 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 17:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 16:54 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: no-op deploy to sync up new cloudweb hosts (duration: 08m 47s)
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 16:45 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: no-op deploy to sync up new cloudweb hosts
* 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 16:19 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
* 16:02 jbond: puppet-agent to puppet7 component
* 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:57 jbond: ruby-semantic-puppet to puppet7 component
* 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 15:49 jbond: ruby-sorted-set to puppet7 component
* 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2046.codfw.wmnet with OS bullseye
* 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31725 and previous config saved to /var/cache/conftool/dbconfig/20220722-150727-ladsgroup.json
* 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31724 and previous config saved to /var/cache/conftool/dbconfig/20220722-150707-ladsgroup.json
* 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 15:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2046.codfw.wmnet with reason: host reimage
* 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:03 jbond: ruby-rbtree to puppet7 component
* 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - [[phab:T324334|T324334]]
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
* 15:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2046.codfw.wmnet with reason: host reimage
* 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31722 and previous config saved to /var/cache/conftool/dbconfig/20220722-145201-ladsgroup.json
* 12:09 jynus: dropping all databases from db1133
* 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31721 and previous config saved to /var/cache/conftool/dbconfig/20220722-144734-ladsgroup.json
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
* 14:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2046.codfw.wmnet with OS bullseye
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31720 and previous config saved to /var/cache/conftool/dbconfig/20220722-143655-ladsgroup.json
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31719 and previous config saved to /var/cache/conftool/dbconfig/20220722-143229-ladsgroup.json
* 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 14:29 moritzm: restarting tomcat on idp-test.w.o
* 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31718 and previous config saved to /var/cache/conftool/dbconfig/20220722-142150-ladsgroup.json
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
* 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31717 and previous config saved to /var/cache/conftool/dbconfig/20220722-141724-ladsgroup.json
* 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2033.codfw.wmnet with OS bullseye
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
* 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - [[phab:T321309|T321309]]
* 13:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
* 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
* 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31713 and previous config saved to /var/cache/conftool/dbconfig/20220722-131710-ladsgroup.json
* 09:54 moritzm: installing debootstrap updates from bullseye point release
* 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:53 moritzm: rebalance ganeti codfw/C [[phab:T323222|T323222]]
* 13:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31712 and previous config saved to /var/cache/conftool/dbconfig/20220722-131650-ladsgroup.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
* 13:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
* 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31711 and previous config saved to /var/cache/conftool/dbconfig/20220722-130145-ladsgroup.json
* 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
* 12:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31710 and previous config saved to /var/cache/conftool/dbconfig/20220722-124640-ladsgroup.json
* 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
* 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
* 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31708 and previous config saved to /var/cache/conftool/dbconfig/20220722-102452-ladsgroup.json
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:21 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2021.codfw.wmnet to cluster codfw and group B
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31707 and previous config saved to /var/cache/conftool/dbconfig/20220722-100948-ladsgroup.json
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:06 XioNoX: push pfw policies - [[phab:T313522|T313522]]
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31706 and previous config saved to /var/cache/conftool/dbconfig/20220722-095444-ladsgroup.json
* 07:41 moritzm: draining ganeti5001 for eventual decom [[phab:T322048|T322048]]
* 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31705 and previous config saved to /var/cache/conftool/dbconfig/20220722-093940-ladsgroup.json
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31704 and previous config saved to /var/cache/conftool/dbconfig/20220722-093754-ladsgroup.json
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
* 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
* 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
* 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45{{!}}46).eqiad.wmnet,cluster=jobrunner
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39{{!}}40).eqiad.wmnet,cluster=videoscaler
* 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31702 and previous config saved to /var/cache/conftool/dbconfig/20220722-093453-ladsgroup.json
* 09:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31701 and previous config saved to /var/cache/conftool/dbconfig/20220722-084647-ladsgroup.json
* 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31700 and previous config saved to /var/cache/conftool/dbconfig/20220722-084627-ladsgroup.json
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 12 hosts with reason: Maintenance
* 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 12 hosts with reason: Maintenance
* 08:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31697 and previous config saved to /var/cache/conftool/dbconfig/20220722-080112-ladsgroup.json
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31696 and previous config saved to /var/cache/conftool/dbconfig/20220722-074844-ladsgroup.json
* 07:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 07:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 06:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2014.codfw.wmnet with OS bullseye
* 06:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
* 05:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2014.codfw.wmnet with OS bullseye
* 05:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2021.codfw.wmnet with reason: host reimage
* 05:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2021.codfw.wmnet with reason: host reimage
* 05:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2021.codfw.wmnet with OS bullseye
* 05:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 05:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 04:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 04:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31694 and previous config saved to /var/cache/conftool/dbconfig/20220722-045543-ladsgroup.json
* 04:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31693 and previous config saved to /var/cache/conftool/dbconfig/20220722-044038-ladsgroup.json
* 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31692 and previous config saved to /var/cache/conftool/dbconfig/20220722-042533-ladsgroup.json
* 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31691 and previous config saved to /var/cache/conftool/dbconfig/20220722-041028-ladsgroup.json
* 04:05 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: disable debug log on test2wiki (cleanup) (duration: 03m 05s)
* 04:01 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I9051d20cd1}} (duration: 03m 02s)
* 03:58 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I9051d20cd1}} (duration: 03m 10s)
* 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31690 and previous config saved to /var/cache/conftool/dbconfig/20220722-031014-ladsgroup.json
* 03:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31689 and previous config saved to /var/cache/conftool/dbconfig/20220722-030954-ladsgroup.json
* 03:09 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: disable debug log on test2wiki (duration: 02m 47s)
* 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31688 and previous config saved to /var/cache/conftool/dbconfig/20220722-025449-ladsgroup.json
* 02:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31687 and previous config saved to /var/cache/conftool/dbconfig/20220722-023943-ladsgroup.json
* 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31685 and previous config saved to /var/cache/conftool/dbconfig/20220722-002622-ladsgroup.json
* 00:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31684 and previous config saved to /var/cache/conftool/dbconfig/20220722-002601-ladsgroup.json
* 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31683 and previous config saved to /var/cache/conftool/dbconfig/20220722-001056-ladsgroup.json


== 2022-07-21 ==
== 2022-12-01 ==
* 23:53 mutante: https://policy.wikimedia.org moved from Wordpress DNS back to WMF DNS - now redirects to https://wikimediafoundation.org/advocacy/ as requested on [[phab:T310738|T310738]] {{!}} this might also resolve [[phab:T132104|T132104]] or not because wikimediafoundation.org is also on wordpress VIP
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
* 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P31680 and previous config saved to /var/cache/conftool/dbconfig/20220721-234045-ladsgroup.json
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:22 mutante: [cumin2002:~] $ sudo cumin 'C:profile::httpbb' "rm /srv/deployment/httpbb-tests/appserver/test_search.yaml"
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2045.codfw.wmnet with OS bullseye
* 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 22:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2045.codfw.wmnet with reason: host reimage
* 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 22:52 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2045.codfw.wmnet with reason: host reimage
* 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
* 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
* 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31678 and
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
* 22:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:856008{{!}}GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue]] (duration: 07m 28s)
* 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
* 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:54 rzl@cumin1001: END (PASS


== 2022-07-20 ==
==Archives ==
* 23:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS bullseye
* 23:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS bullseye
* 23:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS bullseye
* 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS bullseye
* 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS bullseye
* 23:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2072.codfw.wmnet with reason: host reimage
* 23:29 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2070.codfw.wmnet with reason: host reimage
* 23:29 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic2067.codfw.wmnet with reason: host reimage
* 23:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2071.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2068.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2071.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2072.codfw.wmnet with reason: host reimage
* 23:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2070.codfw.wmnet with reason: host reimage
* 23:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2067.codfw.wmnet with reason: host reimage
* 23:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2068.codfw.wmnet with reason: host reimage
* 23:11 ryankemper: [[phab:T300943|T300943]] Fixed IPMI passwords for elastic `20[67,68,70,71,72]`, reimaging them to bullseye (these hosts are not in service, thus the batch operation)
* 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS bullseye
* 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS bullseye
* 23:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS bullseye
* 23:07 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS bullseye
* 23:07 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS bullseye
* 21:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 21:45 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 20:45 cjming: end of UTC late backport window
* 20:43 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814906{{!}}Deploy the new grid layout to group 1 (T312241)]] (duration: 03m 16s)
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:38 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814906{{!}}Deploy the new grid layout to group 1 (T312241)]] (duration: 03m 14s)
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2032.codfw.wmnet with OS bullseye
* 20:27 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815359{{!}}Enable DiscussionTools visualenhancements as beta feature on partner wikis (T312670)]] (duration: 03m 26s)
* 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31555 and previous config saved to /var/cache/conftool/dbconfig/20220720-201240-marostegui.json
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2032.codfw.wmnet with reason: host reimage
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815359{{!}}Enable DiscussionTools visualenhancements as beta feature on partner wikis (T312670)]] (duration: 03m 10s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:08 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2032.codfw.wmnet with reason: host reimage
* 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31554 and previous config saved to /var/cache/conftool/dbconfig/20220720-195734-marostegui.json
* 19:54 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2032.codfw.wmnet with OS bullseye
* 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 02m 53s)
* 19:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31553 and previous config saved to /var/cache/conftool/dbconfig/20220720-194229-marostegui.json
* 19:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:33 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/3D/src/PatentFormField.php: Backport: [[gerrit:815733{{!}}PatentFormField: pass on $this->mParent to HTMLRadioField constructor (T313432)]] (duration: 03m 08s)
* 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31552 and previous config saved to /var/cache/conftool/dbconfig/20220720-192724-marostegui.json
* 19:17 jeena: that should be revert group1 wikis to 1.39.0-wmf.19
* 19:13 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group[0{{!}}1] wikis to [VERSION]"
* 18:37 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 18:35 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2045.codfw.wmnet with OS bullseye
* 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31551 and previous config saved to /var/cache/conftool/dbconfig/20220720-182710-marostegui.json
* 18:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 18:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 18:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 18:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 15 hosts with reason: Maintenance
* 18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 15 hosts with reason: Maintenance
* 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 18:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31550 and previous config saved to /var/cache/conftool/dbconfig/20220720-182339-marostegui.json
* 18:17 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2045.codfw.wmnet with OS bullseye
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:16 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 03m 07s)
* 18:15 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31549 and previous config saved to /var/cache/conftool/dbconfig/20220720-180834-marostegui.json
* 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31548 and previous config saved to /var/cache/conftool/dbconfig/20220720-175328-marostegui.json
* 17:51 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 17:50 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 17:38 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 17:38 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2048.codfw.wmnet with OS bullseye
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31547 and previous config saved to /var/cache/conftool/dbconfig/20220720-173823-marostegui.json
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31546 and previous config saved to /var/cache/conftool/dbconfig/20220720-173522-marostegui.json
* 17:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 17:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31545 and previous config saved to /var/cache/conftool/dbconfig/20220720-173502-marostegui.json
* 17:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2048.codfw.wmnet with reason: host reimage
* 17:25 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2048.codfw.wmnet with reason: host reimage
* 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31544 and previous config saved to /var/cache/conftool/dbconfig/20220720-171956-marostegui.json
* 17:12 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'enable-puppet 815759'
* 17:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2048.codfw.wmnet with OS bullseye
* 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31543 and previous config saved to /var/cache/conftool/dbconfig/20220720-170451-marostegui.json
* 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31542 and previous config saved to /var/cache/conftool/dbconfig/20220720-164946-marostegui.json
* 16:49 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'disable-puppet 815759'
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31541 and previous config saved to /var/cache/conftool/dbconfig/20220720-164638-marostegui.json
* 16:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 16:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31540 and previous config saved to /var/cache/conftool/dbconfig/20220720-164618-marostegui.json
* 16:40 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31539 and previous config saved to /var/cache/conftool/dbconfig/20220720-163113-marostegui.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31538 and previous config saved to /var/cache/conftool/dbconfig/20220720-161608-marostegui.json
* 16:05 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31537 and previous config saved to /var/cache/conftool/dbconfig/20220720-160103-marostegui.json
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31536 and previous config saved to /var/cache/conftool/dbconfig/20220720-155752-marostegui.json
* 15:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 15:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31535 and previous config saved to /var/cache/conftool/dbconfig/20220720-155732-marostegui.json
* 15:57 dancy@deploy1002: Installation of scap version "4.11.2" completed for 557 hosts
* 15:56 dancy@deploy1002: Installing scap version "4.11.2" for 557 hosts
* 15:50 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
* 15:46 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2036.codfw.wmnet with OS bullseye
* 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31534 and previous config saved to /var/cache/conftool/dbconfig/20220720-154227-marostegui.json
* 15:39 dancy@deploy1002: rebuilt and synchronized wikiversions files: testing
* 15:35 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
* 15:28 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31532 and previous config saved to /var/cache/conftool/dbconfig/20220720-152721-marostegui.json
* 15:26 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
* 15:26 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 15:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2036.codfw.wmnet with reason: host reimage
* 15:20 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2036.codfw.wmnet with reason: host reimage
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fix db2167:3318', diff saved to https://phabricator.wikimedia.org/P31531 and previous config saved to /var/cache/conftool/dbconfig/20220720-151711-marostegui.json
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31530 and previous config saved to /var/cache/conftool/dbconfig/20220720-151216-marostegui.json
* 15:10 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31529 and previous config saved to /var/cache/conftool/dbconfig/20220720-150908-marostegui.json
* 15:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 15:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 15:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31528 and previous config saved to /var/cache/conftool/dbconfig/20220720-150730-marostegui.json
* 15:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2036.codfw.wmnet with OS bullseye
* 14:59 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31527 and previous config saved to /var/cache/conftool/dbconfig/20220720-145224-marostegui.json
* 14:44 volans: installing spicearck 3.1.0 on cumin2002
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31524 and previous config saved to /var/cache/conftool/dbconfig/20220720-143719-marostegui.json
* 14:36 volans: uploaded spicerack_3.1.0 to apt.wikimedia.org bullseye-wikimedia
* 14:26 moritzm: installing containerd security updates in Kubernetes codfw masters
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31523 and previous config saved to /var/cache/conftool/dbconfig/20220720-142214-marostegui.json
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31522 and previous config saved to /var/cache/conftool/dbconfig/20220720-141912-marostegui.json
* 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31521 and previous config saved to /var/cache/conftool/dbconfig/20220720-141851-marostegui.json
* 14:04 Lucas_WMDE: UTC afternoon backport+config window done
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31520 and previous config saved to /var/cache/conftool/dbconfig/20220720-140346-marostegui.json
* 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/WikibaseLexeme/WikibaseLexeme.resources.php: Backport: [[gerrit:815726{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (2/2) (duration: 03m 02s)
* 14:02 jbond: disable puppet on A:cp to deplot Gerrit:768766
* 13:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/WikibaseLexeme/src/MediaWiki/Config/LexemeLanguageCodePropertyIdConfig.php: Backport: [[gerrit:815726{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (1/2) (duration: 02m 56s)
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:54 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ git -C php-1.39.0-wmf.19/extensions/WikibaseLexeme am --skip # [[phab:T308659|T308659]] backport already applied
* 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2034.codfw.wmnet with OS bullseye
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31519 and previous config saved to /var/cache/conftool/dbconfig/20220720-134841-marostegui.json
* 13:45 moritzm: installing containerd security updates in Kubernetes codfw cluster
* 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/WikibaseLexeme/WikibaseLexeme.resources.php: Backport: [[gerrit:815425{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (2/2) (duration: 03m 08s)
* 13:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/WikibaseLexeme/src/MediaWiki/Config/LexemeLanguageCodePropertyIdConfig.php: Backport: [[gerrit:815425{{!}}Load Special:NewLexemeAlpha RL modules on mobile (T313116)]] (1/2) (duration: 03m 34s)
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 moritzm: installing request-tracker4 security updates
* 13:33 XioNoX: cr2-eqiad# deactivate interfaces xe-3/3/0 - [[phab:T313337|T313337]]
* 13:33 XioNoX: cr2-eqiad# deactivate interfaces xe-3/3/0 -
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31518 and previous config saved to /var/cache/conftool/dbconfig/20220720-133336-marostegui.json
* 13:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2034.codfw.wmnet with reason: host reimage
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31517 and previous config saved to /var/cache/conftool/dbconfig/20220720-133030-marostegui.json
* 13:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 13:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31516 and previous config saved to /var/cache/conftool/dbconfig/20220720-133010-marostegui.json
* 13:29 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2034.codfw.wmnet with reason: host reimage
* 13:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2034.codfw.wmnet with OS bullseye
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31515 and previous config saved to /var/cache/conftool/dbconfig/20220720-131505-marostegui.json
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31514 and previous config saved to /var/cache/conftool/dbconfig/20220720-130000-marostegui.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31513 and previous config saved to /var/cache/conftool/dbconfig/20220720-124453-marostegui.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31512 and previous config saved to /var/cache/conftool/dbconfig/20220720-124042-marostegui.json
* 12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 12:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 12:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31511 and previous config saved to /var/cache/conftool/dbconfig/20220720-123751-marostegui.json
* 12:29 marostegui: Move pc1014 from pc2 to pc3 [[phab:T313401|T313401]]
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31510 and previous config saved to /var/cache/conftool/dbconfig/20220720-122246-marostegui.json
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31509 and previous config saved to /var/cache/conftool/dbconfig/20220720-120738-marostegui.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31507 and previous config saved to /var/cache/conftool/dbconfig/20220720-115233-marostegui.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31506 and previous config saved to /var/cache/conftool/dbconfig/20220720-113424-marostegui.json
* 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 11:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2009.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 11:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2009.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
* 11:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.1 - ayounsi@cumin1001
* 11:05 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.1 - ayounsi@cumin1001
* 11:03 moritzm: draining ganeti2014 [[phab:T310483|T310483]]
* 10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
* 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2020.codfw.wmnet with OS bullseye
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 12 hosts with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 12 hosts with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 10:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31504 and previous config saved to /var/cache/conftool/dbconfig/20220720-103825-marostegui.json
* 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2020.codfw.wmnet with reason: host reimage
* 10:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2020.codfw.wmnet with reason: host reimage
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31503 and previous config saved to /var/cache/conftool/dbconfig/20220720-102320-marostegui.json
* 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2003.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2020.codfw.wmnet with OS bullseye
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31502 and previous config saved to /var/cache/conftool/dbconfig/20220720-100815-marostegui.json
* 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2029.codfw.wmnet to cluster codfw and group A
* 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2029.codfw.wmnet to cluster codfw and group A
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31501 and previous config saved to /var/cache/conftool/dbconfig/20220720-095310-marostegui.json
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 09:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31499 and previous config saved to /var/cache/conftool/dbconfig/20220720-085256-marostegui.json
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31498 and previous config saved to /var/cache/conftool/dbconfig/20220720-085236-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31497 and previous config saved to /var/cache/conftool/dbconfig/20220720-083731-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31496 and previous config saved to /var/cache/conftool/dbconfig/20220720-082226-marostegui.json
* 08:14 elukey: apt-get clean on archiva1002 to free some space
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31495 and previous config saved to /var/cache/conftool/dbconfig/20220720-080721-marostegui.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31494 and previous config saved to /var/cache/conftool/dbconfig/20220720-080509-marostegui.json
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31493 and previous config saved to /var/cache/conftool/dbconfig/20220720-080442-marostegui.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31492 and previous config saved to /var/cache/conftool/dbconfig/20220720-074937-marostegui.json
* 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
* 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
* 07:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31491 and previous config saved to /var/cache/conftool/dbconfig/20220720-073432-marostegui.json
* 07:31 jayme: ml-serve1002.eqiad.wmnet,ml-serve1004.eqiad.wmnet 'systemctl restart rsyslog'
* 07:30 taavi@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/SecurePoll/cli/wm-scripts/bv2022/populateEditCount.php: [[phab:T309753|T309753]] backports (duration: 02m 54s)
* 07:30 jayme: kubernetes1010.eqiad.wmnet,kubernetes1020.eqiad.wmnet 'systemctl restart rsyslog'
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 taavi@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/SecurePoll/cli/wm-scripts/bv2022/: [[phab:T309753|T309753]] backports (duration: 02m 57s)
* 07:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31490 and previous config saved to /var/cache/conftool/dbconfig/20220720-071927-marostegui.json
* 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2029.codfw.wmnet with OS bullseye
* 07:14 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815251{{!}}Enable ContentTranslation out of Beta for sswiki (T309384)]] (duration: 03m 24s)
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31489 and previous config saved to /var/cache/conftool/dbconfig/20220720-071114-marostegui.json
* 07:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31488 and previous config saved to /var/cache/conftool/dbconfig/20220720-071054-marostegui.json
* 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage
* 06:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2029.codfw.wmnet with reason: host reimage
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31487 and previous config saved to /var/cache/conftool/dbconfig/20220720-065549-marostegui.json
* 06:43 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2029.codfw.wmnet with OS bullseye
* 06:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2020.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31486 and previous config saved to /var/cache/conftool/dbconfig/20220720-064044-marostegui.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31485 and previous config saved to /var/cache/conftool/dbconfig/20220720-062539-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31484 and previous config saved to /var/cache/conftool/dbconfig/20220720-062327-marostegui.json
* 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31483 and previous config saved to /var/cache/conftool/dbconfig/20220720-062307-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31482 and previous config saved to /var/cache/conftool/dbconfig/20220720-060802-marostegui.json
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31481 and previous config saved to /var/cache/conftool/dbconfig/20220720-055256-marostegui.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31480 and previous config saved to /var/cache/conftool/dbconfig/20220720-053751-marostegui.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31479 and previous config saved to /var/cache/conftool/dbconfig/20220720-053620-marostegui.json
* 05:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31478 and previous config saved to /var/cache/conftool/dbconfig/20220720-053520-marostegui.json
* 05:26 marostegui: Stop mysql on db2087 (s6 and s7) to clone db2169 [[phab:T311493|T311493]]
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31475 and previous config saved to /var/cache/conftool/dbconfig/20220720-052014-marostegui.json
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31474 and previous config saved to /var/cache/conftool/dbconfig/20220720-050509-marostegui.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2168 to dbctl in s7 and s8 [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31473 and previous config saved to /var/cache/conftool/dbconfig/20220720-045918-marostegui.json
* 04:57 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31472 and previous config saved to /var/cache/conftool/dbconfig/20220720-045004-marostegui.json
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31471 and previous config saved to /var/cache/conftool/dbconfig/20220720-044729-marostegui.json
* 04:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 04:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 04:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 04:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 04:10 rzl: rzl@kubemaster1001:~$ sudo systemctl restart kube-apiserver
* 04:08 rzl: rzl@kubemaster1002:~$ sudo systemctl restart kube-apiserver
* 03:48 rzl: rzl@cumin2002:~$ sudo cumin dbproxy[1019,1020,1021].eqiad.wmnet 'systemctl reload haproxy'
* 03:37 rzl: rzl@dbproxy1018:~$ sudo systemctl reload haproxy
* 03:30 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 03:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host elastic2060.codfw.wmnet with OS bullseye
* 03:19 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
* 03:10 tstarling@deploy1002: Finished scap: revert yue -> zh fallback, needs LC rebuild in both branches [[phab:T296188|T296188]] (duration: 19m 41s)
* 02:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:51 tstarling@deploy1002: Started scap: revert yue -> zh fallback, needs LC rebuild in both branches [[phab:T296188|T296188]]
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2052.codfw.wmnet with OS bullseye
* 01:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2052.codfw.wmnet with reason: host reimage
* 01:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2052.codfw.wmnet with reason: host reimage
* 01:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2052.codfw.wmnet with OS bullseye
* 01:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS bullseye
* 00:43 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2051.codfw.wmnet with reason: host reimage
* 00:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2051.codfw.wmnet with reason: host reimage
* 00:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS bullseye
 
== 2022-07-19 ==
* 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 8 hosts with reason: Maintenance
* 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 8 hosts with reason: Maintenance
* 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 22:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31470 and previous config saved to /var/cache/conftool/dbconfig/20220719-225828-marostegui.json
* 22:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2050.codfw.wmnet with OS bullseye
* 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31469 and previous config saved to /var/cache/conftool/dbconfig/20220719-224323-marostegui.json
* 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2050.codfw.wmnet with reason: host reimage
* 22:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2050.codfw.wmnet with reason: host reimage
* 22:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31468 and previous config saved to /var/cache/conftool/dbconfig/20220719-222818-marostegui.json
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31467 and previous config saved to /var/cache/conftool/dbconfig/20220719-221312-marostegui.json
* 22:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31466 and previous config saved to /var/cache/conftool/dbconfig/20220719-221035-marostegui.json
* 22:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2050.codfw.wmnet with OS bullseye
* 22:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31465 and previous config saved to /var/cache/conftool/dbconfig/20220719-220946-marostegui.json
* 22:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31464 and previous config saved to /var/cache/conftool/dbconfig/20220719-215441-marostegui.json
* 21:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 21:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31463 and previous config saved to /var/cache/conftool/dbconfig/20220719-213936-marostegui.json
* 21:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2026.codfw.wmnet with OS bullseye
* 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:36 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 04m 02s)
* 21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 21:26 dancy@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Config: [[gerrit:815317{{!}}MWConfigCacheGenerator: If opcache.revalidate_freq is 0, use grace period of 10 seconds (T311788)]] (duration: 02m 59s)
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31462 and previous config saved to /var/cache/conftool/dbconfig/20220719-212431-marostegui.json
* 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31461 and previous config saved to /var/cache/conftool/dbconfig/20220719-212149-marostegui.json
* 21:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 21:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31460 and previous config saved to /var/cache/conftool/dbconfig/20220719-212128-marostegui.json
* 21:17 jforrester@deploy1002: Synchronized php-1.39.0-wmf.21/extensions/Scribunto/includes/Hooks.php: Train unblocker: [[gerrit:815281{{!}}Hooks: Bump scribunto-stats cache version (T313341)]] (duration: 03m 14s)
* 21:16 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2026.codfw.wmnet with reason: host reimage
* 21:14 cjming: end of UTC late backport window
* 21:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2026.codfw.wmnet with reason: host reimage
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815374{{!}}uzwiki: Create "eliminator" group (T302670)]] (duration: 03m 13s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:07 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815374{{!}}uzwiki: Create "eliminator" group (T302670)]] (duration: 03m 19s)
* 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31459 and previous config saved to /var/cache/conftool/dbconfig/20220719-210623-marostegui.json
* 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2026.codfw.wmnet with OS bullseye
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:00 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:776334{{!}}Add "uploader" user group for kswiki. (T305320)]] (duration: 02m 58s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:774841{{!}}Add file mover user group for azwiki (T304968)]] (duration: 02m 52s)
* 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31458 and previous config saved to /var/cache/conftool/dbconfig/20220719-205118-marostegui.json
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:774841{{!}}Add file mover user group for azwiki (T304968)]] (duration: 03m 15s)
* 20:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2055.codfw.wmnet with OS bullseye
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:36 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:815360{{!}} Bumping portals to master (T128546)]] (duration: 02m 53s)
* 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31457 and previous config saved to /var/cache/conftool/dbconfig/20220719-203613-marostegui.json
* 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31456 and previous config saved to /var/cache/conftool/dbconfig/20220719-203327-marostegui.json
* 20:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:815360{{!}} Bumping portals to master (T128546)]] (duration: 03m 09s)
* 20:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31455 and previous config saved to /var/cache/conftool/dbconfig/20220719-203307-marostegui.json
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:29 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:815338{{!}}[wmf-config]: Undeploy GDI Survey Wave 2 (T312866)]] (duration: 03m 12s)
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2055.codfw.wmnet with reason: host reimage
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:23 cjming@deploy1002: Synchronized wmf-config: Config: [[gerrit:814869{{!}}Deploy the new grid layout to group 0 wikis (T312241)]] (duration: 03m 05s)
* 20:21 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2055.codfw.wmnet with reason: host reimage
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31454 and previous config saved to /var/cache/conftool/dbconfig/20220719-201802-marostegui.json
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814908{{!}}cirrus: Dont recycle completion suggester indices]] (duration: 03m 12s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "testwikis to 1.39.0-wmf.19"
* 20:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2055.codfw.wmnet with OS bullseye
* 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31453 and previous config saved to /var/cache/conftool/dbconfig/20220719-200257-marostegui.json
* 20:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:51 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.19"
* 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31452 and previous config saved to /var/cache/conftool/dbconfig/20220719-194752-marostegui.json
* 19:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2056.codfw.wmnet with OS bullseye
* 19:27 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 19:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2069.codfw.wmnet with OS bullseye
* 19:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31451 and previous config saved to /var/cache/conftool/dbconfig/20220719-192207-marostegui.json
* 19:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 19:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 19:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31450 and previous config saved to /var/cache/conftool/dbconfig/20220719-192147-marostegui.json
* 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2069.codfw.wmnet with reason: host reimage
* 19:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31449 and previous config saved to /var/cache/conftool/dbconfig/20220719-190642-marostegui.json
* 19:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2056.codfw.wmnet with reason: host reimage
* 19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:04 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2069.codfw.wmnet with reason: host reimage
* 19:02 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 19:02 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2056.codfw.wmnet with reason: host reimage
* 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31448 and previous config saved to /var/cache/conftool/dbconfig/20220719-185137-marostegui.json
* 18:50 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2069.codfw.wmnet with OS bullseye
* 18:49 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.17, 1.39.0-wmf.18 (duration: 02m 09s)
* 18:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:44 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2056.codfw.wmnet with OS bullseye
* 18:42 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - [[phab:T289135|T289135]]
* 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31447 and previous config saved to /var/cache/conftool/dbconfig/20220719-183632-marostegui.json
* 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31446 and previous config saved to /var/cache/conftool/dbconfig/20220719-183351-marostegui.json
* 18:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 18:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31445 and previous config saved to /var/cache/conftool/dbconfig/20220719-183330-marostegui.json
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:18 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
* 18:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31444 and previous config saved to /var/cache/conftool/dbconfig/20220719-181825-marostegui.json
* 18:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 18:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31443 and previous config saved to /var/cache/conftool/dbconfig/20220719-180320-marostegui.json
* 17:51 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]] (duration: 04m 24s)
* 17:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31442 and previous config saved to /var/cache/conftool/dbconfig/20220719-174815-marostegui.json
* 17:46 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31441 and previous config saved to /var/cache/conftool/dbconfig/20220719-174537-marostegui.json
* 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 17:45 jhuneidi@deploy1002: Installation of scap version "4.11.1" completed for 557 hosts
* 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31440 and previous config saved to /var/cache/conftool/dbconfig/20220719-174517-marostegui.json
* 17:45 jhuneidi@deploy1002: Installing scap version "4.11.1" for 557 hosts
* 17:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31439 and previous config saved to /var/cache/conftool/dbconfig/20220719-173012-marostegui.json
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31438 and previous config saved to /var/cache/conftool/dbconfig/20220719-171507-marostegui.json
* 17:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:06 jhuneidi@deploy1002: scap failed: ValueError php_fpm expected targets, 0 given (duration: 37m 54s)
* 17:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31437 and previous config saved to /var/cache/conftool/dbconfig/20220719-170002-marostegui.json
* 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31436 and previous config saved to /var/cache/conftool/dbconfig/20220719-165747-marostegui.json
* 16:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 16:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 16:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 16:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 16:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 16:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 16:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1028.eqiad.wmnet
* 16:43 XioNoX: cr2-eqiad# run request chassis fpc slot 3 offline
* 16:42 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1028.eqiad.wmnet
* 16:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:28 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 16:23 jhuneidi@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/srv/mediawiki-staging/php-1.39.0-wmf.19/cache/gitinfo/info-extensions-FileImporter.json' (duration: 00m 00s)
* 16:23 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 16:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 16:18 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 8 hosts with reason: Maintenance
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on 8 hosts with reason: Maintenance
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 16:18 XioNoX: drain traffic away from cr2-eqiad:fpc3 - [[phab:T312745|T312745]]
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31435 and previous config saved to /var/cache/conftool/dbconfig/20220719-161803-marostegui.json
* 16:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:14 jhuneidi@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/srv/mediawiki-staging/php-1.39.0-wmf.19/cache/gitinfo/info-extensions-GrowthExperiments.json' (duration: 00m 00s)
* 16:14 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.21  refs [[phab:T308074|T308074]]
* 16:04 moritzm: installing node-minimist security updates
* 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31434 and previous config saved to /var/cache/conftool/dbconfig/20220719-160258-marostegui.json
* 15:58 moritzm: draining ganeti2020 [[phab:T310483|T310483]]
* 15:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2029.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 15:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2029.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 15:56 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
* 15:55 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
* 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31433 and previous config saved to /var/cache/conftool/dbconfig/20220719-154753-marostegui.json
* 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31432 and previous config saved to /var/cache/conftool/dbconfig/20220719-153248-marostegui.json
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31431 and previous config saved to /var/cache/conftool/dbconfig/20220719-153040-marostegui.json
* 15:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 15:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31430 and previous config saved to /var/cache/conftool/dbconfig/20220719-153009-marostegui.json
* 15:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1007.wikimedia.org
* 15:22 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
* 15:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:17 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31429 and previous config saved to /var/cache/conftool/dbconfig/20220719-151503-marostegui.json
* 15:14 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 15:13 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1001
* 15:12 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1001
* 15:03 moritzm: installing nghttp2 security updates
* 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31427 and previous config saved to /var/cache/conftool/dbconfig/20220719-145958-marostegui.json
* 14:50 moritzm: installing python-urlllib3 security updates
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31426 and previous config saved to /var/cache/conftool/dbconfig/20220719-144453-marostegui.json
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31425 and previous config saved to /var/cache/conftool/dbconfig/20220719-144245-marostegui.json
* 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31424 and previous config saved to /var/cache/conftool/dbconfig/20220719-144208-marostegui.json
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31423 and previous config saved to /var/cache/conftool/dbconfig/20220719-142703-marostegui.json
* 14:23 dancy@deploy1002: Installation of scap version "4.11.0" completed for 557 hosts
* 14:22 dancy@deploy1002: Installing scap version "4.11.0" for 557 hosts
* 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest1001.eqiad.wmnet
* 14:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:16 moritzm: installing glib2.0 security updates
* 14:15 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31422 and previous config saved to /var/cache/conftool/dbconfig/20220719-141158-marostegui.json
* 14:11 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts sretest1001.eqiad.wmnet
* 13:58 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts sretest1001.eqiad.wmnet
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31421 and previous config saved to /var/cache/conftool/dbconfig/20220719-135652-marostegui.json
* 13:55 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts sretest1001.eqiad.wmnet
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31420 and previous config saved to /var/cache/conftool/dbconfig/20220719-135532-marostegui.json
* 13:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 13:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31419 and previous config saved to /var/cache/conftool/dbconfig/20220719-135511-marostegui.json
* 13:45 moritzm: installing cron security updates
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31418 and previous config saved to /var/cache/conftool/dbconfig/20220719-134006-marostegui.json
* 13:37 marostegui: Stop mysql on db1132 to upgrade package
* 13:34 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 13:33 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Resync after touching (duration: 02m 38s)
* 13:32 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 13:30 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 13:28 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
* 13:28 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 13:27 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
* 13:26 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 13:25 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31417 and previous config saved to /var/cache/conftool/dbconfig/20220719-132501-marostegui.json
* 13:24 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 13:23 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 13:22 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 13:22 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 13:21 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814373{{!}}brwikimedia: Use logo and wordmark in vector-2022 and minerva (T313194)]] (duration: 02m 48s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:16 hashar@deploy1002: Synchronized static/images/mobile/copyright: Config: [[gerrit:814372{{!}}brwikimedia: Add logo and wordmark for vector-2022 and minerva (T313194)]] (duration: 02m 57s)
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31416 and previous config saved to /var/cache/conftool/dbconfig/20220719-130956-marostegui.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31415 and previous config saved to /var/cache/conftool/dbconfig/20220719-130736-marostegui.json
* 13:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31414 and previous config saved to /var/cache/conftool/dbconfig/20220719-130716-marostegui.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31413 and previous config saved to /var/cache/conftool/dbconfig/20220719-125211-marostegui.json
* 12:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netboxdb1001.eqiad.wmnet
* 12:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:45 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:45 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netboxdb1001.eqiad.wmnet
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31412 and previous config saved to /var/cache/conftool/dbconfig/20220719-123706-marostegui.json
* 12:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netboxdb1001.eqiad.wmnet
* 12:30 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:26 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:25 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netboxdb1001.eqiad.wmnet
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31411 and previous config saved to /var/cache/conftool/dbconfig/20220719-122201-marostegui.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31409 and previous config saved to /var/cache/conftool/dbconfig/20220719-121941-marostegui.json
* 12:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31408 and previous config saved to /var/cache/conftool/dbconfig/20220719-121921-marostegui.json
* 12:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31407 and previous config saved to /var/cache/conftool/dbconfig/20220719-120416-marostegui.json
* 12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:01 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache ([[phab:T310777|T310777]]) (duration: 02m 49s)
* 12:00 moritzm: upgrading ganeti/ulsfo to 3.0.2 [[phab:T312637|T312637]]
* 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31406 and previous config saved to /var/cache/conftool/dbconfig/20220719-115719-ladsgroup.json
* 11:52 urbanecm@deploy1002: Synchronized langlist: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 42s)
* 11:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 35s)
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31405 and previous config saved to /var/cache/conftool/dbconfig/20220719-114911-marostegui.json
* 11:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 49s)
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 11:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2001.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 11:43 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 56s)
* 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31404 and previous config saved to /var/cache/conftool/dbconfig/20220719-114214-ladsgroup.json
* 11:41 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating blkwiki ([[phab:T310777|T310777]])
* 11:37 urbanecm@deploy1002: Synchronized dblists: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 52s)
* 11:34 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating blkwiki ([[phab:T310777|T310777]]) (duration: 02m 47s)
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31403 and previous config saved to /var/cache/conftool/dbconfig/20220719-113406-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31401 and previous config saved to /var/cache/conftool/dbconfig/20220719-113158-marostegui.json
* 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31400 and previous config saved to /var/cache/conftool/dbconfig/20220719-113137-marostegui.json
* 11:27 moritzm: remove ganeti 3.0.1-2+deb11u0 from buster-wikimedia, superceded by ganeti 3.0.2-1~deb11u1 from Bullseye 11.4 point release [[phab:T312637|T312637]]
* 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31399 and previous config saved to /var/cache/conftool/dbconfig/20220719-112708-ladsgroup.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31398 and previous config saved to /var/cache/conftool/dbconfig/20220719-111632-marostegui.json
* 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31397 and previous config saved to /var/cache/conftool/dbconfig/20220719-111203-ladsgroup.json
* 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to plain, [[phab:T311686|T311686]]
* 11:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to plain, [[phab:T311686|T311686]]
* 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31396 and previous config saved to /var/cache/conftool/dbconfig/20220719-110127-marostegui.json
* 11:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netboxdb2001.codfw.wmnet
* 11:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:59 moritzm: draining ganeti2020 [[phab:T310483|T310483]]
* 10:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31395 and previous config saved to /var/cache/conftool/dbconfig/20220719-104622-marostegui.json
* 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31394 and previous config saved to /var/cache/conftool/dbconfig/20220719-104559-ladsgroup.json
* 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312990|T312990]])', diff saved to https://phabricator.wikimedia.org/P31393 and previous config saved to /var/cache/conftool/dbconfig/20220719-104414-marostegui.json
* 10:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P31392 and previous config saved to /var/cache/conftool/dbconfig/20220719-103341-root.json
* 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: Switch instance to DRBD, [[phab:T311686|T311686]]
* 10:05 elukey: reboot an-worker1127 - hdfs datanode caused CPU stalls
* 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 09:50 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netboxdb2001.codfw.wmnet
* 09:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox1001.wikimedia.org
* 09:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:46 moritzm: draining ganeti2029 [[phab:T310483|T310483]]
* 09:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:40 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox1001.wikimedia.org
* 09:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox2001.wikimedia.org
* 09:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 09:34 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:29 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox2001.wikimedia.org
* 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:00 urbanecm: Deployed patch for [[phab:T313205|T313205]]
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2018.codfw.wmnet to cluster codfw and group D
* 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2018.codfw.wmnet to cluster codfw and group D
* 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
* 08:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
* 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust db2167:3311 and db2167:3318 weight [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31390 and previous config saved to /var/cache/conftool/dbconfig/20220719-071836-marostegui.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2167:3311 and db2167:3318 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31389 and previous config saved to /var/cache/conftool/dbconfig/20220719-071656-marostegui.json
* 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2084.codfw.wmnet
* 06:56 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:51 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2084.codfw.wmnet
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2084 from dbctl [[phab:T313121|T313121]]', diff saved to https://phabricator.wikimedia.org/P31386 and previous config saved to /var/cache/conftool/dbconfig/20220719-051725-marostegui.json
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
 
== 2022-07-18 ==
* 23:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1050.eqiad.wmnet
* 23:46 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1050.eqiad.wmnet
* 23:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt1049.eqiad.wmnet
* 23:07 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1049.eqiad.wmnet
* 21:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:36 sbassett: Deployed security fix for [[phab:T309894|T309894]]
* 20:58 ebernhardson: start reindex of all wikis except commonswiki and wikidatawiki in eqiad and codfw cirrus clusters
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:45 urbanecm: UTC late B&C window finished
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:45 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/: {{Gerrit|930ecb76a5a9266d498f40b49ab5ff82c01dbcf5}}: reindex: Detect index type from live mappings (duration: 02m 55s)
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8d1663c93d2ddeb107d5f9b8982a7f4a7b880aba}}: Turn off fixed width in main namespace on Wikisource ( [[phab:T311607|T311607]]) (duration: 02m 41s)
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c258b25e8a47caf9d531f01798d32cd3f9b1605}}: Enable language switching button for logged-out users on non-pilot wikis ([[phab:T312861|T312861]]) (duration: 02m 43s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f99c5331380a8c03f4c447e2f73cb76afca337a2}}: Pin cu_log actor migration to old schema ([[phab:T233004|T233004]]) (duration: 02m 41s)
* 20:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|415c4ef44d9bf1abab6942fbbc552990a8e992c8}}: Collapse sidebar by default for anonymous users ([[phab:T287609|T287609]]) (duration: 02m 41s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/resources/src/moment/moment-locale-overrides.js: {{Gerrit|c4d8a217b4ce0a9f7aefaacc032136e7eb058d4d}}: Ensure custom locales for Moment.js overrides, dont change en ([[phab:T313188|T313188]]) (duration: 02m 44s)
* 20:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|76b7cd6379c25175570eeeb2a305de0fd0bc61e5}}: Mentorship: enable the Vue version of the dashboard in test ([[phab:T300532|T300532]]) (duration: 03m 00s)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:04 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 19:02 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31385 and previous config saved to /var/cache/conftool/dbconfig/20220718-184146-root.json
* 18:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 18:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2066.codfw.wmnet with OS bullseye
* 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31384 and previous config saved to /var/cache/conftool/dbconfig/20220718-182642-root.json
* 18:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS bullseye
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS bullseye
* 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31382 and previous config saved to /var/cache/conftool/dbconfig/20220718-181138-root.json
* 18:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
* 17:57 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2065.codfw.wmnet with reason: host reimage
* 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31381 and previous config saved to /var/cache/conftool/dbconfig/20220718-175634-root.json
* 17:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS bullseye
* 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31380 and previous config saved to /var/cache/conftool/dbconfig/20220718-174130-root.json
* 17:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31379 and previous config saved to /var/cache/conftool/dbconfig/20220718-172626-root.json
* 17:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31378 and previous config saved to /var/cache/conftool/dbconfig/20220718-171122-root.json
* 16:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31377 and previous config saved to /var/cache/conftool/dbconfig/20220718-165617-root.json
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31376 and previous config saved to /var/cache/conftool/dbconfig/20220718-165455-marostegui.json
* 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31375 and previous config saved to /var/cache/conftool/dbconfig/20220718-165349-marostegui.json
* 16:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 16:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31374 and previous config saved to /var/cache/conftool/dbconfig/20220718-165329-marostegui.json
* 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31373 and previous config saved to /var/cache/conftool/dbconfig/20220718-163824-marostegui.json
* 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31372 and previous config saved to /var/cache/conftool/dbconfig/20220718-162319-marostegui.json
* 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31371 and previous config saved to /var/cache/conftool/dbconfig/20220718-160813-marostegui.json
* 16:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31370 and previous config saved to /var/cache/conftool/dbconfig/20220718-160708-marostegui.json
* 16:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 16:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31369 and previous config saved to /var/cache/conftool/dbconfig/20220718-160648-marostegui.json
* 15:52 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:814846{{!}} Bumping portals to master (T128546)]] (duration: 02m 59s)
* 15:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31368 and previous config saved to /var/cache/conftool/dbconfig/20220718-155143-marostegui.json
* 15:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:49 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:814846{{!}} Bumping portals to master (T128546)]] (duration: 03m 03s)
* 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:40 ejegg: updated fundraising CiviCRM from {{Gerrit|55bc690b}} to {{Gerrit|b4a7154a}}
* 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31367 and previous config saved to /var/cache/conftool/dbconfig/20220718-153637-marostegui.json
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31366 and previous config saved to /var/cache/conftool/dbconfig/20220718-152132-marostegui.json
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31365 and previous config saved to /var/cache/conftool/dbconfig/20220718-152026-marostegui.json
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31364 and previous config saved to /var/cache/conftool/dbconfig/20220718-151944-marostegui.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31363 and previous config saved to /var/cache/conftool/dbconfig/20220718-150439-marostegui.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31362 and previous config saved to /var/cache/conftool/dbconfig/20220718-145909-ladsgroup.json
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2012.codfw.wmnet to cluster codfw and group C
* 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2012.codfw.wmnet to cluster codfw and group C
* 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P31361 and previous config saved to /var/cache/conftool/dbconfig/20220718-144934-marostegui.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31360 and previous config saved to /var/cache/conftool/dbconfig/20220718-144404-ladsgroup.json
* 14:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31359 and previous config saved to /var/cache/conftool/dbconfig/20220718-143428-marostegui.json
* 14:29 Lucas_WMDE: UTC afternoon backport+config window done
* 14:29 lucaswerkmeister-wmde@deploy1002: Finished scap: refresh everything after adding CampaignEvents to extension-list ([[phab:T311752|T311752]], only enabled in Beta so far), just in case (duration: 14m 40s)
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P31358 and previous config saved to /var/cache/conftool/dbconfig/20220718-142859-ladsgroup.json
* 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:14 lucaswerkmeister-wmde@deploy1002: Started scap: refresh everything after adding CampaignEvents to extension-list ([[phab:T311752|T311752]], only enabled in Beta so far), just in case
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31357 and previous config saved to /var/cache/conftool/dbconfig/20220718-141354-ladsgroup.json
* 14:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:813991{{!}}Load and configure the CampaignEvents extension where enabled (T311752)]] (2/2: should be prod no-op) (duration: 02m 40s)
* 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31356 and previous config saved to /var/cache/conftool/dbconfig/20220718-140947-ladsgroup.json
* 14:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31355 and previous config saved to /var/cache/conftool/dbconfig/20220718-140926-ladsgroup.json
* 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:813991{{!}}Load and configure the CampaignEvents extension where enabled (T311752)]] (1/2: should be no-op) (duration: 02m 51s)
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:813990{{!}}Enable the CampaignEvents extension on beta (T311752)]] (no-op) (duration: 02m 43s)
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31354 and previous config saved to /var/cache/conftool/dbconfig/20220718-135421-ladsgroup.json
* 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813989{{!}}Add config variable for the CampaignEvents extension (T311752)]] (no-op) (duration: 02m 55s)
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:813986{{!}}Add CampaignEvents to extension-list (T311752)]] (duration: 03m 08s)
* 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2028.codfw.wmnet to cluster codfw and group A
* 13:45 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2028.codfw.wmnet to cluster codfw and group A
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2018.codfw.wmnet with OS bullseye
* 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P31353 and previous config saved to /var/cache/conftool/dbconfig/20220718-133916-ladsgroup.json
* 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31352 and previous config saved to /var/cache/conftool/dbconfig/20220718-133414-marostegui.json
* 13:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31351 and previous config saved to /var/cache/conftool/dbconfig/20220718-133354-marostegui.json
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2028.codfw.wmnet
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814111{{!}}Make weighted_tags search default for commonswiki]] (duration: 02m 54s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31350 and previous config saved to /var/cache/conftool/dbconfig/20220718-132411-ladsgroup.json
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2018.codfw.wmnet with reason: host reimage
* 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31349 and previous config saved to /var/cache/conftool/dbconfig/20220718-132009-ladsgroup.json
* 13:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31348 and previous config saved to /var/cache/conftool/dbconfig/20220718-131949-ladsgroup.json
* 13:19 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: Backport: [[gerrit:814767{{!}}Use getOption to detect user preferences (T313209)]] (duration: 02m 50s)
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31347 and previous config saved to /var/cache/conftool/dbconfig/20220718-131848-marostegui.json
* 13:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2018.codfw.wmnet with reason: host reimage
* 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 13:15 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814108{{!}}Update config for commons custommatch search]] (duration: 02m 55s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P31346 and previous config saved to /var/cache/conftool/dbconfig/20220718-130443-ladsgroup.json
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P31345 and previous config saved to /var/cache/conftool/dbconfig/20220718-130343-marostegui.json
* 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2018.codfw.wmnet with OS bullseye
* 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P31344 and previous config saved to /var/cache/conftool/dbconfig/20220718-124938-ladsgroup.json
* 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2012.codfw.wmnet with OS bullseye
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31343 and previous config saved to /var/cache/conftool/dbconfig/20220718-124838-marostegui.json
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31342 and previous config saved to /var/cache/conftool/dbconfig/20220718-124732-marostegui.json
* 12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31341 and previous config saved to /var/cache/conftool/dbconfig/20220718-124712-marostegui.json
* 12:35 godog: update grafana to 8.5.9
* 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31340 and previous config saved to /var/cache/conftool/dbconfig/20220718-123433-ladsgroup.json
* 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31339 and previous config saved to /var/cache/conftool/dbconfig/20220718-123207-marostegui.json
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31338 and previous config saved to /var/cache/conftool/dbconfig/20220718-123029-ladsgroup.json
* 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31337 and previous config saved to /var/cache/conftool/dbconfig/20220718-123009-ladsgroup.json
* 12:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P31336 and previous config saved to /var/cache/conftool/dbconfig/20220718-121702-marostegui.json
* 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P31335 and previous config saved to /var/cache/conftool/dbconfig/20220718-121504-ladsgroup.json
* 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2012.codfw.wmnet with OS bullseye
* 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2028.codfw.wmnet with OS bullseye
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31334 and previous config saved to /var/cache/conftool/dbconfig/20220718-120157-marostegui.json
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31333 and previous config saved to /var/cache/conftool/dbconfig/20220718-120051-marostegui.json
* 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31332 and previous config saved to /var/cache/conftool/dbconfig/20220718-120030-marostegui.json
* 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P31331 and previous config saved to /var/cache/conftool/dbconfig/20220718-115959-ladsgroup.json
* 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
* 11:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31330 and previous config saved to /var/cache/conftool/dbconfig/20220718-114525-marostegui.json
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31329 and previous config saved to /var/cache/conftool/dbconfig/20220718-114454-ladsgroup.json
* 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31328 and previous config saved to /var/cache/conftool/dbconfig/20220718-113947-ladsgroup.json
* 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31327 and previous config saved to /var/cache/conftool/dbconfig/20220718-113927-ladsgroup.json
* 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS bullseye
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P31326 and previous config saved to /var/cache/conftool/dbconfig/20220718-113020-marostegui.json
* 11:25 jbond: re-enable puppet post postgresql re-sync
* 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P31325 and previous config saved to /var/cache/conftool/dbconfig/20220718-112422-ladsgroup.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31324 and previous config saved to /var/cache/conftool/dbconfig/20220718-111515-marostegui.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31323 and previous config saved to /var/cache/conftool/dbconfig/20220718-111409-marostegui.json
* 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31322 and previous config saved to /var/cache/conftool/dbconfig/20220718-111348-marostegui.json
* 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P31319 and previous config saved to /var/cache/conftool/dbconfig/20220718-110916-ladsgroup.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31318 and previous config saved to /var/cache/conftool/dbconfig/20220718-105843-marostegui.json
* 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31317 and previous config saved to /var/cache/conftool/dbconfig/20220718-105411-ladsgroup.json
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31316 and previous config saved to /var/cache/conftool/dbconfig/20220718-104921-ladsgroup.json
* 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31315 and previous config saved to /var/cache/conftool/dbconfig/20220718-104844-ladsgroup.json
* 10:48 jbond: disable puppet fleet wide to resync db
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P31314 and previous config saved to /var/cache/conftool/dbconfig/20220718-104337-marostegui.json
* 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P31313 and previous config saved to /var/cache/conftool/dbconfig/20220718-103339-ladsgroup.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31312 and previous config saved to /var/cache/conftool/dbconfig/20220718-102832-marostegui.json
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31311 and previous config saved to /var/cache/conftool/dbconfig/20220718-102726-marostegui.json
* 10:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 10:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31310 and previous config saved to /var/cache/conftool/dbconfig/20220718-102706-marostegui.json
* 10:26 Amir1: dbmaint on s5@eqiad ([[phab:T312863|T312863]])
* 10:26 Amir1: dbmaint on s5@codfw ([[phab:T312863|T312863]])
* 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
* 10:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P31308 and previous config saved to /var/cache/conftool/dbconfig/20220718-101834-ladsgroup.json
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31307 and previous config saved to /var/cache/conftool/dbconfig/20220718-101201-marostegui.json
* 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31306 and previous config saved to /var/cache/conftool/dbconfig/20220718-100329-ladsgroup.json
* 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31305 and previous config saved to /var/cache/conftool/dbconfig/20220718-095916-ladsgroup.json
* 09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31304 and previous config saved to /var/cache/conftool/dbconfig/20220718-095856-ladsgroup.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P31303 and previous config saved to /var/cache/conftool/dbconfig/20220718-095656-marostegui.json
* 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P31302 and previous config saved to /var/cache/conftool/dbconfig/20220718-094351-ladsgroup.json
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31301 and previous config saved to /var/cache/conftool/dbconfig/20220718-094150-marostegui.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31300 and previous config saved to /var/cache/conftool/dbconfig/20220718-094043-marostegui.json
* 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31299 and previous config saved to /var/cache/conftool/dbconfig/20220718-094033-marostegui.json
* 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P31298 and previous config saved to /var/cache/conftool/dbconfig/20220718-092845-ladsgroup.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31297 and previous config saved to /var/cache/conftool/dbconfig/20220718-092528-marostegui.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 [[phab:T311106|T311106]]', diff saved to https://phabricator.wikimedia.org/P31295 and previous config saved to /var/cache/conftool/dbconfig/20220718-091957-root.json
* 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31293 and previous config saved to /var/cache/conftool/dbconfig/20220718-091340-ladsgroup.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P31292 and previous config saved to /var/cache/conftool/dbconfig/20220718-091023-marostegui.json
* 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31291 and previous config saved to /var/cache/conftool/dbconfig/20220718-090919-ladsgroup.json
* 09:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31290 and previous config saved to /var/cache/conftool/dbconfig/20220718-090857-ladsgroup.json
* 09:05 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
* 09:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 08:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31289 and previous config saved to /var/cache/conftool/dbconfig/20220718-085518-marostegui.json
* 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P31288 and previous config saved to /var/cache/conftool/dbconfig/20220718-085352-ladsgroup.json
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31287 and previous config saved to /var/cache/conftool/dbconfig/20220718-085312-marostegui.json
* 08:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1109.eqiad.wmnet with reason: Maintenance
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31286 and previous config saved to /var/cache/conftool/dbconfig/20220718-085251-marostegui.json
* 08:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2012.codfw.wmnet with OS bullseye
* 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P31285 and previous config saved to /var/cache/conftool/dbconfig/20220718-083847-ladsgroup.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31284 and previous config saved to /var/cache/conftool/dbconfig/20220718-083746-marostegui.json
* 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
* 08:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2012.codfw.wmnet with reason: host reimage
* 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31283 and previous config saved to /var/cache/conftool/dbconfig/20220718-082342-ladsgroup.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P31282 and previous config saved to /var/cache/conftool/dbconfig/20220718-082241-marostegui.json
* 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31281 and previous config saved to /var/cache/conftool/dbconfig/20220718-081934-ladsgroup.json
* 08:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 08:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31280 and previous config saved to /var/cache/conftool/dbconfig/20220718-081914-ladsgroup.json
* 08:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2012.codfw.wmnet with OS bullseye
* 08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
* 08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 08:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 08:10 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
* 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31279 and previous config saved to /var/cache/conftool/dbconfig/20220718-080735-marostegui.json
* 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P31278 and previous config saved to /var/cache/conftool/dbconfig/20220718-080409-ladsgroup.json
* 08:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2028.codfw.wmnet
* 08:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
* 07:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31277 and previous config saved to /var/cache/conftool/dbconfig/20220718-075527-marostegui.json
* 07:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 07:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31276 and previous config saved to /var/cache/conftool/dbconfig/20220718-075501-marostegui.json
* 07:54 kharlan@deploy1002: Synchronized wmf-config: Config: [[gerrit:814708{{!}}Structured task: Disable free text for "other" rejection reason (T304099)]] (duration: 02m 41s)
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P31275 and previous config saved to /var/cache/conftool/dbconfig/20220718-074904-ladsgroup.json
* 07:47 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2028.codfw.wmnet with OS bullseye
* 07:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:40 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814706{{!}}Enable ContentTranslation out of Beta for ay, ilo, kg, ln, nso, and tn Wikipedias (T309384)]] (duration: 02m 51s)
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31274 and previous config saved to /var/cache/conftool/dbconfig/20220718-073956-marostegui.json
* 07:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
* 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2028.codfw.wmnet with reason: host reimage
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31273 and previous config saved to /var/cache/conftool/dbconfig/20220718-073359-ladsgroup.json
* 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31272 and previous config saved to /var/cache/conftool/dbconfig/20220718-072953-ladsgroup.json
* 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 07:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P31271 and previous config saved to /var/cache/conftool/dbconfig/20220718-072451-marostegui.json
* 07:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: Maintenance
* 07:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 13 hosts with reason: Maintenance
* 07:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 07:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS bullseye
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 07:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31270 and previous config saved to /var/cache/conftool/dbconfig/20220718-071711-ladsgroup.json
* 07:10 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:814015{{!}}Enable Content and Section translation on WPs with NLLB-200 MT support (T309384)]] (duration: 02m 53s)
* 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31269 and previous config saved to /var/cache/conftool/dbconfig/20220718-070946-marostegui.json
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31268 and previous config saved to /var/cache/conftool/dbconfig/20220718-070840-marostegui.json
* 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31267 and previous config saved to /var/cache/conftool/dbconfig/20220718-070820-marostegui.json
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P31266 and previous config saved to /var/cache/conftool/dbconfig/20220718-070205-ladsgroup.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P31265 and previous config saved to /var/cache/conftool/dbconfig/20220718-065315-marostegui.json
* 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P31264 and previous config saved to /var/cache/conftool/dbconfig/20220718-064700-ladsgroup.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P31263 and previous config saved to /var/cache/conftool/dbconfig/20220718-063809-marostegui.json
* 06:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31262 and previous config saved to /var/cache/conftool/dbconfig/20220718-063155-ladsgroup.json
* 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31261 and previous config saved to /var/cache/conftool/dbconfig/20220718-062648-ladsgroup.json
* 06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 06:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 06:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31260 and previous config saved to /var/cache/conftool/dbconfig/20220718-062304-marostegui.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2166 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31259 and previous config saved to /var/cache/conftool/dbconfig/20220718-055051-marostegui.json
* 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2082.codfw.wmnet
* 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:39 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2082.codfw.wmnet
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2082 [[phab:T313003|T313003]]', diff saved to https://phabricator.wikimedia.org/P31258 and previous config saved to /var/cache/conftool/dbconfig/20220718-052605-marostegui.json
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T313070|T313070]])', diff saved to https://phabricator.wikimedia.org/P31257 and previous config saved to /var/cache/conftool/dbconfig/20220718-052250-marostegui.json
* 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 05:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 05:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 15 hosts with reason: Maintenance
* 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 15 hosts with reason: Maintenance
* 05:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 05:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
 
== 2022-07-17 ==
* 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31256 and previous config saved to /var/cache/conftool/dbconfig/20220717-180539-ladsgroup.json
* 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31255 and previous config saved to /var/cache/conftool/dbconfig/20220717-175034-ladsgroup.json
* 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31254 and previous config saved to /var/cache/conftool/dbconfig/20220717-173528-ladsgroup.json
* 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31253 and previous config saved to /var/cache/conftool/dbconfig/20220717-172023-ladsgroup.json
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31252 and previous config saved to /var/cache/conftool/dbconfig/20220717-155102-ladsgroup.json
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31251 and previous config saved to /var/cache/conftool/dbconfig/20220717-155025-ladsgroup.json
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31250 and previous config saved to /var/cache/conftool/dbconfig/20220717-153520-ladsgroup.json
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31249 and previous config saved to /var/cache/conftool/dbconfig/20220717-152015-ladsgroup.json
* 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31248 and previous config saved to /var/cache/conftool/dbconfig/20220717-150510-ladsgroup.json
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31247 and previous config saved to /var/cache/conftool/dbconfig/20220717-132751-ladsgroup.json
* 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 13:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31246 and previous config saved to /var/cache/conftool/dbconfig/20220717-132731-ladsgroup.json
* 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31245 and previous config saved to /var/cache/conftool/dbconfig/20220717-131226-ladsgroup.json
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31244 and previous config saved to /var/cache/conftool/dbconfig/20220717-125720-ladsgroup.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31243 and previous config saved to /var/cache/conftool/dbconfig/20220717-124215-ladsgroup.json
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31242 and previous config saved to /var/cache/conftool/dbconfig/20220717-110523-ladsgroup.json
* 11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31241 and previous config saved to /var/cache/conftool/dbconfig/20220717-110503-ladsgroup.json
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31240 and previous config saved to /var/cache/conftool/dbconfig/20220717-104958-ladsgroup.json
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31239 and previous config saved to /var/cache/conftool/dbconfig/20220717-103453-ladsgroup.json
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31238 and previous config saved to /var/cache/conftool/dbconfig/20220717-101948-ladsgroup.json
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31237 and previous config saved to /var/cache/conftool/dbconfig/20220717-084432-ladsgroup.json
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31236 and previous config saved to /var/cache/conftool/dbconfig/20220717-084411-ladsgroup.json
* 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31235 and previous config saved to /var/cache/conftool/dbconfig/20220717-082906-ladsgroup.json
* 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31234 and previous config saved to /var/cache/conftool/dbconfig/20220717-081401-ladsgroup.json
* 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31233 and previous config saved to /var/cache/conftool/dbconfig/20220717-075856-ladsgroup.json
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31232 and previous config saved to /var/cache/conftool/dbconfig/20220717-071149-ladsgroup.json
* 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31231 and previous config saved to /var/cache/conftool/dbconfig/20220717-071129-ladsgroup.json
* 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31230 and previous config saved to /var/cache/conftool/dbconfig/20220717-065624-ladsgroup.json
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31229 and previous config saved to /var/cache/conftool/dbconfig/20220717-064119-ladsgroup.json
* 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31228 and previous config saved to /var/cache/conftool/dbconfig/20220717-062614-ladsgroup.json
* 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31227 and previous config saved to /var/cache/conftool/dbconfig/20220717-044802-ladsgroup.json
* 04:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 04:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
* 04:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
* 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 02:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 01:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 01:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31226 and previous config saved to /var/cache/conftool/dbconfig/20220717-010309-ladsgroup.json
* 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31225 and previous config saved to /var/cache/conftool/dbconfig/20220717-004804-ladsgroup.json
* 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31224 and previous config saved to /var/cache/conftool/dbconfig/20220717-003259-ladsgroup.json
* 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31223 and previous config saved to /var/cache/conftool/dbconfig/20220717-001754-ladsgroup.json
* 00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31222 and previous config saved to /var/cache/conftool/dbconfig/20220717-000143-ladsgroup.json
* 00:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 00:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
 
== 2022-07-16 ==
* 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31221 and previous config saved to /var/cache/conftool/dbconfig/20220716-221808-ladsgroup.json
* 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31220 and previous config saved to /var/cache/conftool/dbconfig/20220716-220303-ladsgroup.json
* 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31219 and previous config saved to /var/cache/conftool/dbconfig/20220716-214758-ladsgroup.json
* 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31218 and previous config saved to /var/cache/conftool/dbconfig/20220716-213253-ladsgroup.json
* 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31217 and previous config saved to /var/cache/conftool/dbconfig/20220716-203238-ladsgroup.json
* 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 10 hosts with reason: Maintenance
* 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 10 hosts with reason: Maintenance
* 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 20:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31216 and previous config saved to /var/cache/conftool/dbconfig/20220716-200803-ladsgroup.json
* 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P31215 and previous config saved to /var/cache/conftool/dbconfig/20220716-195258-ladsgroup.json
* 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P31214 and previous config saved to /var/cache/conftool/dbconfig/20220716-193753-ladsgroup.json
* 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31213 and previous config saved to /var/cache/conftool/dbconfig/20220716-192248-ladsgroup.json
* 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31212 and previous config saved to /var/cache/conftool/dbconfig/20220716-184459-ladsgroup.json
* 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31211 and previous config saved to /var/cache/conftool/dbconfig/20220716-184428-ladsgroup.json
* 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31210 and previous config saved to /var/cache/conftool/dbconfig/20220716-182922-ladsgroup.json
* 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31209 and previous config saved to /var/cache/conftool/dbconfig/20220716-181417-ladsgroup.json
* 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31208 and previous config saved to /var/cache/conftool/dbconfig/20220716-175912-ladsgroup.json
* 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31207 and previous config saved to /var/cache/conftool/dbconfig/20220716-174959-ladsgroup.json
* 17:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 17:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 17:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 17:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31205 and previous config saved to /var/cache/conftool/dbconfig/20220716-173811-ladsgroup.json
* 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31204 and previous config saved to /var/cache/conftool/dbconfig/20220716-172305-ladsgroup.json
* 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31203 and previous config saved to /var/cache/conftool/dbconfig/20220716-170800-ladsgroup.json
* 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31202 and previous config saved to /var/cache/conftool/dbconfig/20220716-165255-ladsgroup.json
* 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31201 and previous config saved to /var/cache/conftool/dbconfig/20220716-163449-ladsgroup.json
* 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31200 and previous config saved to /var/cache/conftool/dbconfig/20220716-163418-ladsgroup.json
* 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31199 and previous config saved to /var/cache/conftool/dbconfig/20220716-161913-ladsgroup.json
* 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31198 and previous config saved to /var/cache/conftool/dbconfig/20220716-160408-ladsgroup.json
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31197 and previous config saved to /var/cache/conftool/dbconfig/20220716-154903-ladsgroup.json
* 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31196 and previous config saved to /var/cache/conftool/dbconfig/20220716-153647-ladsgroup.json
* 15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31195 and previous config saved to /var/cache/conftool/dbconfig/20220716-153627-ladsgroup.json
* 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31194 and previous config saved to /var/cache/conftool/dbconfig/20220716-152122-ladsgroup.json
* 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31193 and previous config saved to /var/cache/conftool/dbconfig/20220716-150616-ladsgroup.json
* 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31192 and previous config saved to /var/cache/conftool/dbconfig/20220716-145111-ladsgroup.json
* 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31191 and previous config saved to /var/cache/conftool/dbconfig/20220716-143705-ladsgroup.json
* 14:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31190 and previous config saved to /var/cache/conftool/dbconfig/20220716-143645-ladsgroup.json
* 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31189 and previous config saved to /var/cache/conftool/dbconfig/20220716-142140-ladsgroup.json
* 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31188 and previous config saved to /var/cache/conftool/dbconfig/20220716-140634-ladsgroup.json
* 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31187 and previous config saved to /var/cache/conftool/dbconfig/20220716-135129-ladsgroup.json
* 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31186 and previous config saved to /var/cache/conftool/dbconfig/20220716-134429-ladsgroup.json
* 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 00:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS bullseye
* 00:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2064.codfw.wmnet with reason: host reimage
* 00:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2064.codfw.wmnet with reason: host reimage
* 00:13 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS bullseye
 
== 2022-07-15 ==
* 23:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 23:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 23:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31185 and previous config saved to /var/cache/conftool/dbconfig/20220715-231400-ladsgroup.json
* 22:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31184 and previous config saved to /var/cache/conftool/dbconfig/20220715-225855-ladsgroup.json
* 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31183 and previous config saved to /var/cache/conftool/dbconfig/20220715-224350-ladsgroup.json
* 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31182 and previous config saved to /var/cache/conftool/dbconfig/20220715-222845-ladsgroup.json
* 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31181 and previous config saved to /var/cache/conftool/dbconfig/20220715-222427-ladsgroup.json
* 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31180 and previous config saved to /var/cache/conftool/dbconfig/20220715-222407-ladsgroup.json
* 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31179 and previous config saved to /var/cache/conftool/dbconfig/20220715-220902-ladsgroup.json
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31178 and previous config saved to /var/cache/conftool/dbconfig/20220715-215357-ladsgroup.json
* 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31177 and previous config saved to /var/cache/conftool/dbconfig/20220715-213852-ladsgroup.json
* 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31176 and previous config saved to /var/cache/conftool/dbconfig/20220715-213153-ladsgroup.json
* 21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31175 and previous config saved to /var/cache/conftool/dbconfig/20220715-213133-ladsgroup.json
* 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31174 and previous config saved to /var/cache/conftool/dbconfig/20220715-211628-ladsgroup.json
* 21:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS bullseye
* 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31173 and previous config saved to /var/cache/conftool/dbconfig/20220715-210122-ladsgroup.json
* 20:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2063.codfw.wmnet with reason: host reimage
* 20:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2063.codfw.wmnet with reason: host reimage
* 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31172 and previous config saved to /var/cache/conftool/dbconfig/20220715-204617-ladsgroup.json
* 20:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31171 and previous config saved to /var/cache/conftool/dbconfig/20220715-203909-ladsgroup.json
* 20:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS bullseye
* 20:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31170 and previous config saved to /var/cache/conftool/dbconfig/20220715-203849-ladsgroup.json
* 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31169 and previous config saved to /var/cache/conftool/dbconfig/20220715-202344-ladsgroup.json
* 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31168 and previous config saved to /var/cache/conftool/dbconfig/20220715-200839-ladsgroup.json
* 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31167 and previous config saved to /var/cache/conftool/dbconfig/20220715-195334-ladsgroup.json
* 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31166 and previous config saved to /var/cache/conftool/dbconfig/20220715-194418-ladsgroup.json
* 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
* 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31165 and previous config saved to /var/cache/conftool/dbconfig/20220715-194358-ladsgroup.json
* 19:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS bullseye
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31164 and previous config saved to /var/cache/conftool/dbconfig/20220715-192852-ladsgroup.json
* 19:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2062.codfw.wmnet with reason: host reimage
* 19:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2062.codfw.wmnet with reason: host reimage
* 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31163 and previous config saved to /var/cache/conftool/dbconfig/20220715-191347-ladsgroup.json
* 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS bullseye
* 19:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2061.codfw.wmnet with OS bullseye
* 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31162 and previous config saved to /var/cache/conftool/dbconfig/20220715-185842-ladsgroup.json
* 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31161 and previous config saved to /var/cache/conftool/dbconfig/20220715-185107-ladsgroup.json
* 18:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31160 and previous config saved to /var/cache/conftool/dbconfig/20220715-185047-ladsgroup.json
* 18:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2061.codfw.wmnet with reason: host reimage
* 18:44 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2061.codfw.wmnet with reason: host reimage
* 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31159 and previous config saved to /var/cache/conftool/dbconfig/20220715-183542-ladsgroup.json
* 18:31 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2061.codfw.wmnet with OS bullseye
* 18:30 ryankemper: [[phab:T300943|T300943]] Re-imaging `elastic20[61-72]` from buster -> bullseye, one host at a time. These hosts are not in service currently so re-imaging is safe.
* 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31158 and previous config saved to /var/cache/conftool/dbconfig/20220715-182037-ladsgroup.json
* 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31157 and previous config saved to /var/cache/conftool/dbconfig/20220715-180532-ladsgroup.json
* 18:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1004.wikimedia.org with OS bullseye
* 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31156 and previous config saved to /var/cache/conftool/dbconfig/20220715-175822-ladsgroup.json
* 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31155 and previous config saved to /var/cache/conftool/dbconfig/20220715-175801-ladsgroup.json
* 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1003.wikimedia.org with OS bullseye
* 17:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
* 17:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
* 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31154 and previous config saved to /var/cache/conftool/dbconfig/20220715-174256-ladsgroup.json
* 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
* 17:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS bullseye
* 17:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31152 and previous config saved to /var/cache/conftool/dbconfig/20220715-172751-ladsgroup.json
* 17:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
* 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31151 and previous config saved to /var/cache/conftool/dbconfig/20220715-171246-ladsgroup.json
* 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31150 and previous config saved to /var/cache/conftool/dbconfig/20220715-170545-ladsgroup.json
* 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 17:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
* 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
* 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 16:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 16:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 6 hosts with reason: Maintenance
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 6 hosts with reason: Maintenance
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31149 and previous config saved to /var/cache/conftool/dbconfig/20220715-155021-ladsgroup.json
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31148 and previous config saved to /var/cache/conftool/dbconfig/20220715-153515-ladsgroup.json
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31147 and previous config saved to /var/cache/conftool/dbconfig/20220715-152010-ladsgroup.json
* 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31146 and previous config saved to /var/cache/conftool/dbconfig/20220715-150505-ladsgroup.json
* 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31144 and previous config saved to /var/cache/conftool/dbconfig/20220715-140451-ladsgroup.json
* 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31143 and previous config saved to /var/cache/conftool/dbconfig/20220715-140431-ladsgroup.json
* 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31141 and previous config saved to /var/cache/conftool/dbconfig/20220715-134926-ladsgroup.json
* 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31140 and previous config saved to /var/cache/conftool/dbconfig/20220715-133421-ladsgroup.json
* 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31139 and previous config saved to /var/cache/conftool/dbconfig/20220715-131916-ladsgroup.json
* 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31138 and previous config saved to /var/cache/conftool/dbconfig/20220715-130706-ladsgroup.json
* 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31137 and previous config saved to /var/cache/conftool/dbconfig/20220715-130634-ladsgroup.json
* 13:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 13:05 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31136 and previous config saved to /var/cache/conftool/dbconfig/20220715-125129-ladsgroup.json
* 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31135 and previous config saved to /var/cache/conftool/dbconfig/20220715-123624-ladsgroup.json
* 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31134 and previous config saved to /var/cache/conftool/dbconfig/20220715-122119-ladsgroup.json
* 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31133 and previous config saved to /var/cache/conftool/dbconfig/20220715-120750-ladsgroup.json
* 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31132 and previous config saved to /var/cache/conftool/dbconfig/20220715-120713-ladsgroup.json
* 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31131 and previous config saved to /var/cache/conftool/dbconfig/20220715-115207-ladsgroup.json
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31130 and previous config saved to /var/cache/conftool/dbconfig/20220715-113702-ladsgroup.json
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31129 and previous config saved to /var/cache/conftool/dbconfig/20220715-112157-ladsgroup.json
* 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31128 and previous config saved to /var/cache/conftool/dbconfig/20220715-105748-ladsgroup.json
* 10:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:56 hashar@deploy1002: Finished deploy [integration/docroot@e563641]: Add banan-i18n library (duration: 00m 08s)
* 10:56 hashar@deploy1002: Started deploy [integration/docroot@e563641]: Add banan-i18n library
* 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31127 and previous config saved to /var/cache/conftool/dbconfig/20220715-103513-ladsgroup.json
* 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31126 and previous config saved to /var/cache/conftool/dbconfig/20220715-102008-ladsgroup.json
* 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31125 and previous config saved to /var/cache/conftool/dbconfig/20220715-100503-ladsgroup.json
* 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31124 and previous config saved to /var/cache/conftool/dbconfig/20220715-094958-ladsgroup.json
* 09:38 Amir1: killed refreshLinkRecommendations.php in testwiki ([[phab:T299021|T299021]])
* 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31123 and previous config saved to /var/cache/conftool/dbconfig/20220715-093449-ladsgroup.json
* 09:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:26 moritzm: update thirdparty/node16 to Node 16.16.0
* 07:26 moritzm: update thirdparty/node14 to Node 14.20.0
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31121 and previous config saved to /var/cache/conftool/dbconfig/20220715-064928-root.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31120 and previous config saved to /var/cache/conftool/dbconfig/20220715-063424-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31119 and previous config saved to /var/cache/conftool/dbconfig/20220715-061920-root.json
* 06:08 ryankemper: [[phab:T311939|T311939]] Updated list of masters for psi-codfw search to `elastic2027.codfw.wmnet:9700,elastic2029.codfw.wmnet:9700,elastic2054.codfw.wmnet:9700`
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31118 and previous config saved to /var/cache/conftool/dbconfig/20220715-060416-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31117 and previous config saved to /var/cache/conftool/dbconfig/20220715-054912-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31116 and previous config saved to /var/cache/conftool/dbconfig/20220715-053408-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31115 and previous config saved to /var/cache/conftool/dbconfig/20220715-051904-root.json
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31114 and previous config saved to /var/cache/conftool/dbconfig/20220715-050400-root.json
* 00:30 TimStarling: on ms-fe1010 restarting swift-proxy
 
== 2022-07-14 ==
* 22:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 22:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31112 and previous config saved to /var/cache/conftool/dbconfig/20220714-221112-ladsgroup.json
* 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31111 and previous config saved to /var/cache/conftool/dbconfig/20220714-215606-ladsgroup.json
* 21:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31110 and previous config saved to /var/cache/conftool/dbconfig/20220714-214101-ladsgroup.json
* 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31109 and previous config saved to /var/cache/conftool/dbconfig/20220714-212556-ladsgroup.json
* 21:15 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31108 and previous config saved to /var/cache/conftool/dbconfig/20220714-210347-ladsgroup.json
* 21:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 21:03 ryankemper: [[phab:T289135|T289135]] First host reimage done, manually killed rolling-operation cookbook before the next host reimage so that we can test out https://gerrit.wikimedia.org/r/813979
* 21:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
* 21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31107 and previous config saved to /var/cache/conftool/dbconfig/20220714-210327-ladsgroup.json
* 21:02 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 20:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2027.codfw.wmnet with OS bullseye
* 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31106 and previous config saved to /var/cache/conftool/dbconfig/20220714-204822-ladsgroup.json
* 20:45 thcipriani: utc-late backport window complete
* 20:45 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CampaignEvents: Backport: [[gerrit:813657{{!}}CampaignEvents: backport extension for Jul 18 beta deploy (T311752)]] (duration: 02m 49s)
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:36 ryankemper: Restarting elastic services `ryankemper@elastic2054:~$ sudo systemctl restart elasticsearch_6@production*`
* 20:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic2027.codfw.wmnet with reason: host reimage
* 20:34 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2027.codfw.wmnet with reason: host reimage
* 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31105 and previous config saved to /var/cache/conftool/dbconfig/20220714-203317-ladsgroup.json
* 20:33 ryankemper: [Elastic] `ryankemper@elastic2054:~$ sudo run-puppet-agent` to add 2054 as an eligible master for codfw-psi
* 20:30 ryankemper: [Elastic] We're working on promoting `elastic2054` to a master to replace `elastic2049` which is in hw failure
* 20:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudweb1004.wikimedia.org with OS bullseye
* 20:18 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2027.codfw.wmnet with OS bullseye
* 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31104 and previous config saved to /var/cache/conftool/dbconfig/20220714-201812-ladsgroup.json
* 20:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - [[phab:T289135|T289135]]
* 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31103 and previous config saved to /var/cache/conftool/dbconfig/20220714-195715-ladsgroup.json
* 19:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 19:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31102 and previous config saved to /var/cache/conftool/dbconfig/20220714-195655-ladsgroup.json
* 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31100 and previous config saved to /var/cache/conftool/dbconfig/20220714-194150-ladsgroup.json
* 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31098 and previous config saved to /var/cache/conftool/dbconfig/20220714-192645-ladsgroup.json
* 19:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudweb1003.wikimedia.org with OS bullseye
* 19:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS bullseye
* 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31097 and previous config saved to /var/cache/conftool/dbconfig/20220714-191140-ladsgroup.json
* 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31096 and previous config saved to /var/cache/conftool/dbconfig/20220714-182328-ladsgroup.json
* 18:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31095 and previous config saved to /var/cache/conftool/dbconfig/20220714-182308-ladsgroup.json
* 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
* 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31094 and previous config saved to /var/cache/conftool/dbconfig/20220714-180803-ladsgroup.json
* 18:02 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudweb1003.wikimedia.org with OS bullseye
* 17:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
* 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31093 and previous config saved to /var/cache/conftool/dbconfig/20220714-175258-ladsgroup.json
* 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31092 and previous config saved to /var/cache/conftool/dbconfig/20220714-173753-ladsgroup.json
* 17:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:15 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:15 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:14 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31091 and previous config saved to /var/cache/conftool/dbconfig/20220714-163953-ladsgroup.json
* 16:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 16:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31090 and previous config saved to /var/cache/conftool/dbconfig/20220714-163933-ladsgroup.json
* 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31089 and previous config saved to /var/cache/conftool/dbconfig/20220714-162428-ladsgroup.json
* 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31088 and previous config saved to /var/cache/conftool/dbconfig/20220714-160923-ladsgroup.json
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31087 and previous config saved to /var/cache/conftool/dbconfig/20220714-160846-marostegui.json
* 16:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 16:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31086 and previous config saved to /var/cache/conftool/dbconfig/20220714-155418-ladsgroup.json
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31085 and previous config saved to /var/cache/conftool/dbconfig/20220714-155341-marostegui.json
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31084 and previous config saved to /var/cache/conftool/dbconfig/20220714-153836-marostegui.json
* 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31083 and previous config saved to /var/cache/conftool/dbconfig/20220714-152331-marostegui.json
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31082 and previous config saved to /var/cache/conftool/dbconfig/20220714-152118-marostegui.json
* 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31081 and previous config saved to /var/cache/conftool/dbconfig/20220714-152040-marostegui.json
* 15:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: sync
* 15:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: sync
* 15:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: sync
* 15:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: sync
* 15:13 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
* 15:13 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
* 15:12 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b8f66e9]: (no justification provided) (duration: 00m 10s)
* 15:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b8f66e9]: (no justification provided)
* 15:10 ejegg: updated payments-wiki from {{Gerrit|6a8aa302}} to {{Gerrit|be11fac2}}
* 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31080 and previous config saved to /var/cache/conftool/dbconfig/20220714-150535-marostegui.json
* 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31079 and previous config saved to /var/cache/conftool/dbconfig/20220714-145736-ladsgroup.json
* 14:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31078 and previous config saved to /var/cache/conftool/dbconfig/20220714-145716-ladsgroup.json
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31077 and previous config saved to /var/cache/conftool/dbconfig/20220714-145030-marostegui.json
* 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31076 and previous config saved to /var/cache/conftool/dbconfig/20220714-144211-ladsgroup.json
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31075 and previous config saved to /var/cache/conftool/dbconfig/20220714-143525-marostegui.json
* 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31074 and previous config saved to /var/cache/conftool/dbconfig/20220714-142706-ladsgroup.json
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31073 and previous config saved to /var/cache/conftool/dbconfig/20220714-141917-marostegui.json
* 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 14:19 papaul: on going PDU maintenance in rack A6 codfw
* 14:19 papaul: on going PU maintenance in rack A6 codfw
* 14:18 papaul: on going PU maintenance in rack A6 codfw
* 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31072 and previous config saved to /var/cache/conftool/dbconfig/20220714-141846-marostegui.json
* 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31071 and previous config saved to /var/cache/conftool/dbconfig/20220714-141201-ladsgroup.json
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31070 and previous config saved to /var/cache/conftool/dbconfig/20220714-140341-marostegui.json
* 14:02 matthiasmullie: UTC afternoon backport window done
* 13:53 mlitn@deploy1002: Finished scap: Backport: [[gerrit:813829{{!}}Improve maint script output & update i18n messages]] (duration: 16m 05s)
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31069 and previous config saved to /var/cache/conftool/dbconfig/20220714-135038-ladsgroup.json
* 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31068 and previous config saved to /var/cache/conftool/dbconfig/20220714-135000-ladsgroup.json
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31067 and previous config saved to /var/cache/conftool/dbconfig/20220714-134836-marostegui.json
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 mlitn@deploy1002: Started scap: Backport: [[gerrit:813829{{!}}Improve maint script output & update i18n messages]]
* 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31065 and previous config saved to /var/cache/conftool/dbconfig/20220714-133455-ladsgroup.json
* 13:34 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813881{{!}}Update boosts for weighted_tags]] (duration: 02m 45s)
* 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31064 and previous config saved to /var/cache/conftool/dbconfig/20220714-133331-marostegui.json
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31063 and previous config saved to /var/cache/conftool/dbconfig/20220714-133051-marostegui.json
* 13:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31062 and previous config saved to /var/cache/conftool/dbconfig/20220714-133031-marostegui.json
* 13:30 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813880{{!}}Add custommatch search feature config for commons]] (duration: 02m 58s)
* 13:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813609{{!}}Enable Special:NewLexemeAlpha on Wikidata and TestWikidata (T306016)]] (re-sync, config change seemingly not consistently picked up) (duration: 02m 45s)
* 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31061 and previous config saved to /var/cache/conftool/dbconfig/20220714-131950-ladsgroup.json
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:15 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813609{{!}}Enable Special:NewLexemeAlpha on Wikidata and TestWikidata (T306016)]] (duration: 02m 57s)
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31060 and previous config saved to /var/cache/conftool/dbconfig/20220714-131525-marostegui.json
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31059 and previous config saved to /var/cache/conftool/dbconfig/20220714-130445-ladsgroup.json
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31058 and previous config saved to /var/cache/conftool/dbconfig/20220714-130020-marostegui.json
* 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31057 and previous config saved to /var/cache/conftool/dbconfig/20220714-124515-marostegui.json
* 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31056 and previous config saved to /var/cache/conftool/dbconfig/20220714-124321-ladsgroup.json
* 12:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31055 and previous config saved to /var/cache/conftool/dbconfig/20220714-124239-marostegui.json
* 12:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 12:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31054 and previous config saved to /var/cache/conftool/dbconfig/20220714-124219-marostegui.json
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31053 and previous config saved to /var/cache/conftool/dbconfig/20220714-122714-marostegui.json
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31052 and previous config saved to /var/cache/conftool/dbconfig/20220714-121209-marostegui.json
* 12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
* 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
* 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31051 and previous config saved to /var/cache/conftool/dbconfig/20220714-115701-marostegui.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31050 and previous config saved to /var/cache/conftool/dbconfig/20220714-115448-marostegui.json
* 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 11:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31049 and previous config saved to /var/cache/conftool/dbconfig/20220714-115316-marostegui.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31048 and previous config saved to /var/cache/conftool/dbconfig/20220714-113811-marostegui.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31047 and previous config saved to /var/cache/conftool/dbconfig/20220714-112304-marostegui.json
* 11:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312977|T312977]])', diff saved to https://phabricator.wikimedia.org/P31046 and previous config saved to /var/cache/conftool/dbconfig/20220714-110759-marostegui.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2164 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31038 and previous config saved to /var/cache/conftool/dbconfig/20220714-052056-marostegui.json
* 05:07 AndyRussG: update payments-wiki-staging {{Gerrit|10304f69}} -> {{Gerrit|be11fac2}}
* 04:32 oblivian@puppetmaster1001: conftool action : edit; selector: name=ReadOnly,scope=codfw
* 04:25 tstarling@puppetmaster1001: conftool action : edit; selector: name=ReadOnly,scope=codfw
* 04:23 tstarling@puppetmaster1001: conftool action : get/ReadOnly; selector: name=ReadOnly,scope=codfw
* 01:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I73fbfee8248c}} (duration: 02m 56s)
* 01:09 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I73fbfee8248c}} (duration: 02m 45s)
* 01:03 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: {{Gerrit|Ie11bdfdcf5e6724}} (duration: 02m 55s)
* 01:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:44 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: {{Gerrit|Ie11bdfdcf5e6724}} (duration: 02m 55s)
* 00:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:29 krinkle@deploy1002: Synchronized wmf-config/wikitech.php: {{Gerrit|Ib539da0c0953}} (duration: 02m 47s)
* 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
 
== 2022-07-13 ==
* 22:17 inflatador: bking@elastic2055 successfully staged NIC firmware updates for elastic2055-2060
* 22:09 inflatador: bking@elastic2055 staging NIC firmware updates for elastic2055-2060
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:09 Lucas_WMDE: UTC late backport+config window done
* 21:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813691{{!}}Disable DiscussionTools beta feature at mediawikiwiki (T310960)]] (duration: 02m 47s)
* 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:02 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:812377{{!}}QuickSurveys: Undeploy 'research-incentive' (T311015)]] (2/2, beta) (duration: 02m 58s)
* 20:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:812377{{!}}QuickSurveys: Undeploy 'research-incentive' (T311015)]] (1/2, prod) (duration: 02m 48s)
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/CommentItem.js: Backport: [[gerrit:813666{{!}}Avoid localized digits in internal timestamps in JS (T312828)]] (duration: 02m 49s)
* 20:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2040.codfw.wmnet with OS bullseye
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:813340{{!}}Undeploy CongressLookup (part 3) (T312894)]] (duration: 03m 00s)
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:28 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813339{{!}}Undeploy CongressLookup (part 2) (T312894)]] (duration: 02m 53s)
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:813338{{!}}Undeploy CongressLookup (part 1) (T312894)]] (duration: 03m 04s)
* 20:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2040.codfw.wmnet with reason: host reimage
* 20:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2040.codfw.wmnet with reason: host reimage
* 19:59 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2040.codfw.wmnet with OS bullseye
* 18:20 sukhe: upload pdns-recursor_4.6.2-1+wmf11u1 to apt.wm.org (bullseye) - [[phab:T305589|T305589]]
* 17:54 sukhe: upload dnsdist_1.7.2-1+wmf11u1 to apt.wm.org (bullseye) - [[phab:T305589|T305589]]
* 17:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 17:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 16:17 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@e58e61d]: (no justification provided) (duration: 00m 10s)
* 16:17 milimetric@deploy1002: Started deploy [airflow-dags/analytics@e58e61d]: (no justification provided)
* 15:59 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2040.codfw.wmnet with OS bullseye
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:56 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2040.codfw.wmnet with OS bullseye
* 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:12 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9edd1ab]: Deploy [airflow-dags/analytics@9edd1ab] (duration: 00m 10s)
* 15:12 aqu@deploy1002: Started deploy [airflow-dags/analytics@9edd1ab]: Deploy [airflow-dags/analytics@9edd1ab]
* 15:10 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9edd1ab]: Deploy [airflow-dags/analytics_test@9edd1ab] (duration: 00m 08s)
* 15:10 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9edd1ab]: Deploy [airflow-dags/analytics_test@9edd1ab]
* 14:52 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
* 14:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
* 14:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@03c1a05]: Deploy [airflow-dags/analytics_test@03c1a05] (duration: 00m 12s)
* 14:34 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@03c1a05]: Deploy [airflow-dags/analytics_test@03c1a05]
* 14:19 aqu: Deployed refinery using scap, then deployed onto hdfs
* 14:11 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
* 14:08 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67] (duration: 07m 42s)
* 14:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
* 14:01 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67]
* 14:00 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67] (duration: 00m 07s)
* 14:00 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67]
* 13:47 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from x1 master', diff saved to https://phabricator.wikimedia.org/P31037 and previous config saved to /var/cache/conftool/dbconfig/20220713-134413-marostegui.json
* 13:37 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
* 13:20 Lucas_WMDE: UTC afternoon backport window done
* 13:20 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host elastic2049.codfw.wmnet
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:17 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790399{{!}}Configure wgLexemeLexicalCategoryItemIds on Wikidata (T307441)]] (duration: 02m 45s)
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813594{{!}}Configure $wgBabelCategoryNames on Test Wikidata (T312920)]] (duration: 02m 51s)
* 13:05 inflatador: bking@elastic2049 rebooting for read-only fs
* 13:04 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2049.codfw.wmnet
* 12:49 damilare: payments-wiki upgraded from {{Gerrit|2f95d8b4}} to {{Gerrit|6a8aa302}}
* 12:12 moritzm: draining ganeti2028 [[phab:T311686|T311686]]
* 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2018.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 12:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2018.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: codfw s8 sanitarium master switch
* 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: codfw s8 sanitarium master switch
* 10:42 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67]: Regular analytics weekly train (2nd try. --force) [analytics/refinery@bd39e67] (duration: 04m 52s)
* 10:38 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67]: Regular analytics weekly train (2nd try. --force) [analytics/refinery@bd39e67]
* 10:27 moritzm: draining ganeti1028 [[phab:T311686|T311686]]
* 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2012.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 10:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2012.codfw.wmnet with reason: Remove node for eventual reimage, [[phab:T311686|T311686]]
* 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31035 and previous config saved to /var/cache/conftool/dbconfig/20220713-090748-ladsgroup.json
* 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31034 and previous config saved to /var/cache/conftool/dbconfig/20220713-085244-ladsgroup.json
* 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31033 and previous config saved to /var/cache/conftool/dbconfig/20220713-083740-ladsgroup.json
* 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31032 and previous config saved to /var/cache/conftool/dbconfig/20220713-082236-ladsgroup.json
* 08:05 jayme: 'systemctl restart rsyslog' on kubernetes2007.codfw.wmnet,kubernetes2010.codfw.wmnet,kubernetes2014.codfw.wmnet,kubernetes2020.codfw.wmnet,kubernetes2009.codfw.wmnet
* 07:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 07:52 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 07:51 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 07:50 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31031 and previous config saved to /var/cache/conftool/dbconfig/20220713-070229-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31030 and previous config saved to /var/cache/conftool/dbconfig/20220713-064725-root.json
* 06:45 aqu: analytics/refinery deploy aborted, no more space to deploy in /srv on an-launcher1002 eqiad
* 06:44 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67]: Regular analytics weekly train [analytics/refinery@bd39e67] (duration: 27m 02s)
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31029 and previous config saved to /var/cache/conftool/dbconfig/20220713-063221-root.json
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31028 and previous config saved to /var/cache/conftool/dbconfig/20220713-061717-root.json
* 06:16 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67]: Regular analytics weekly train [analytics/refinery@bd39e67]
* 06:16 aqu: analytics/refinery deployment
* 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31027 and previous config saved to /var/cache/conftool/dbconfig/20220713-060213-root.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31026 and previous config saved to /var/cache/conftool/dbconfig/20220713-054709-root.json
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31025 and previous config saved to /var/cache/conftool/dbconfig/20220713-053205-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31024 and previous config saved to /var/cache/conftool/dbconfig/20220713-051701-root.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2162 in s8 [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31023 and previous config saved to /var/cache/conftool/dbconfig/20220713-051239-marostegui.json
 
== 2022-07-12 ==
* 22:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2039.codfw.wmnet with OS bullseye
* 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec (duration: 02m 04s)
* 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec
* 22:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage
* 22:11 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage
* 21:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2039.codfw.wmnet with OS bullseye
* 20:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2038.codfw.wmnet with OS bullseye
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage
* 20:07 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage
* 19:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 19:38 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye
* 19:35 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 19:34 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye
* 19:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 19:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye
* 19:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 19:30 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye
* 19:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 19:26 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I3071c009c}} (2) (duration: 02m 45s)
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:20 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I3071c009c}} (duration: 03m 09s)
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update [[phab:T312298|T312298]]
* 19:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update [[phab:T312298|T312298]]
* 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:13 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic1065.eqiad.wmnet
* 19:13 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic1065.eqiad.wmnet
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2037.codfw.wmnet with OS bullseye
* 16:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2037.codfw.wmnet with reason: host reimage
* 16:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2037.codfw.wmnet with reason: host reimage
* 16:55 bblack: codfw dns repooled for front edge traffic
* 16:50 herron: ran failed codfw puppet agents
* 16:47 mutante: doc1002 - systemctl reset-failed
* 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1026.eqiad.wmnet
* 16:36 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 16:19 mutante: rebooting mwdebug2001 via ganeti2022
* 16:15 cwhite: repair networking on people2002
* 16:11 cwhite: repair networking on puppetdb2002
* 16:10 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1026.eqiad.wmnet
* 16:05 mutante: parse200[1-3] - restarted ferm
* 16:03 mutante: mw2401 through mw2410 - performing ferm restarts (without cumin, has its own issue)
* 15:57 mutante: mw2405 - restarted ferm
* 15:50 bblack: codfw dns depooled for front edge traffic
* 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic1065.eqiad.wmnet with reason: firmware update [[phab:T312298|T312298]]
* 15:48 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic1065.eqiad.wmnet with reason: firmware update [[phab:T312298|T312298]]
* 15:30 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 15:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 15:06 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 15:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 15:06 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2037.codfw.wmnet with OS bullseye
* 15:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:02 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 14:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 14:56 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 14:52 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 14:48 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 14:48 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 14:47 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2037.codfw.wmnet with OS bullseye
* 14:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on druid1008.eqiad.wmnet with reason: [[phab:T308331|T308331]] btullis
* 14:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on druid1008.eqiad.wmnet with reason: [[phab:T308331|T308331]] btullis
* 14:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 14:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 14:30 papaul: on going PDU maintenenace in rack A5
* 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 14:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2037.codfw.wmnet
* 13:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
* 13:41 Lucas_WMDE: UTC afternoon backport window done
* 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/CommentItem.js: Backport: [[gerrit:812956{{!}}Parse 'DiscussionToolsTimestampFormatSwitchTime' config value as UTC (T312828)]] (duration: 02m 50s)
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Rack move, [[phab:T308331|T308331]]
* 12:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Rack move, [[phab:T308331|T308331]]
* 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 10:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to x1 master until the replica is back from maintenance', diff saved to https://phabricator.wikimedia.org/P31018 and previous config saved to /var/cache/conftool/dbconfig/20220712-101246-marostegui.json
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for onsite maintenance [[phab:T308331|T308331]]', diff saved to https://phabricator.wikimedia.org/P31017 and previous config saved to /var/cache/conftool/dbconfig/20220712-101211-root.json
* 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:12 hashar: Restarted Zuul [[phab:T309371|T309371]]
* 08:58 hashar: Restarted Gerrit [[phab:T309371|T309371]]
* 08:25 hashar@deploy1002: Finished deploy [integration/docroot@c2cceaf]: Fix NPM URL for Wikimedia language-data library (duration: 00m 08s)
* 08:25 hashar@deploy1002: Started deploy [integration/docroot@c2cceaf]: Fix NPM URL for Wikimedia language-data library
* 07:10 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@89cb17d]: subgraph_and_query_mapping: Increase executor memory to 12g, use repartition (duration: 02m 02s)
* 07:08 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@89cb17d]: subgraph_and_query_mapping: Increase executor memory to 12g, use repartition
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P31014 and previous config saved to /var/cache/conftool/dbconfig/20220712-070240-root.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31013 and previous config saved to /var/cache/conftool/dbconfig/20220712-065352-root.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31012 and previous config saved to /var/cache/conftool/dbconfig/20220712-063848-root.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31011 and previous config saved to /var/cache/conftool/dbconfig/20220712-062344-root.json
* 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 06:12 marostegui: dbmaint s3@eqiad [[phab:T310011|T310011]]
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 [[phab:T311610|T311610]]', diff saved to https://phabricator.wikimedia.org/P31010 and previous config saved to /var/cache/conftool/dbconfig/20220712-060407-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1157 to s3 primary and set section read-write [[phab:T311610|T311610]]', diff saved to https://phabricator.wikimedia.org/P31009 and previous config saved to /var/cache/conftool/dbconfig/20220712-060058-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T311610|T311610]]', diff saved to https://phabricator.wikimedia.org/P31008 and previous config saved to /var/cache/conftool/dbconfig/20220712-060031-marostegui.json
* 06:00 marostegui: Starting s3 eqiad failover from db1123 to db1157 - [[phab:T311610|T311610]]
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1157 with weight 0 [[phab:T311610|T311610]]', diff saved to https://phabricator.wikimedia.org/P31007 and previous config saved to /var/cache/conftool/dbconfig/20220712-051927-root.json
* 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: Primary switchover s3 [[phab:T311610|T311610]]
* 05:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: Primary switchover s3 [[phab:T311610|T311610]]
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:10 ejegg: updated payments-wiki from {{Gerrit|53a7b7bd}} to {{Gerrit|2f95d8b4}}
 
== 2022-07-11 ==
* 21:49 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3ba1d4c]: subgraph_query_mapping_daily: Increase partitioning to 2048 (duration: 02m 02s)
* 21:47 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3ba1d4c]: subgraph_query_mapping_daily: Increase partitioning to 2048
* 20:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@a559f82]: subgraph: Use HivePartitionRangeSensor to wait for sparql queries (duration: 02m 00s)
* 20:36 TheresNoTime: UTC late deploys done
* 20:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@a559f82]: subgraph: Use HivePartitionRangeSensor to wait for sparql queries
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:28 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:812897{{!}}Migrate WikibaseTermboxInteraction from EventLogging to EventGate on all wikis (T290303)]] (duration: 02m 53s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I82262ef6773ab228}} try again ref [[phab:T311788|T311788]] (duration: 03m 07s)
* 19:41 hashar@deploy1002: Finished deploy [integration/docroot@fc5d65a]: Add language-data library (duration: 00m 08s)
* 19:41 hashar@deploy1002: Started deploy [integration/docroot@fc5d65a]: Add language-data library
* 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P31005 and previous config saved to /var/cache/conftool/dbconfig/20220711-193315-marostegui.json
* 18:32 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 17:10 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 16:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@02ab1c2]: use mode=reschedule on all airflow sensors (duration: 02m 02s)
* 16:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@02ab1c2]: use mode=reschedule on all airflow sensors
* 16:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1005.wikimedia.org with OS bullseye
* 16:11 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I82262ef6773ab228}} (duration: 02m 55s)
* 16:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 15:56 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2175.codfw.wmnet with OS bullseye
* 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1005.wikimedia.org with reason: host reimage
* 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1005.wikimedia.org with reason: host reimage
* 15:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:42 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:812892{{!}} Bumping portals to master (T128546)]] (duration: 02m 51s)
* 15:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2175.codfw.wmnet with reason: host reimage
* 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:812892{{!}} Bumping portals to master (T128546)]] (duration: 02m 58s)
* 15:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2175.codfw.wmnet with reason: host reimage
* 15:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:28 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:27 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:23 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2175.codfw.wmnet with OS bullseye
* 15:08 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:34 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:11 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 13:54 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:53 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:53 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:53 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2163 to s8 [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31002 and previous config saved to /var/cache/conftool/dbconfig/20220711-130441-marostegui.json
* 12:05 moritzm: updated bullseye netboot image for Bullseye 11.4 point release [[phab:T312637|T312637]]
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AniketArs out of all services on: 1292 hosts
* 10:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AniketArs out of all services on: 1292 hosts
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AniketArs out of all services on: 663 hosts
* 10:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AniketArs out of all services on: 663 hosts
* 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2027.codfw.wmnet to cluster codfw and group A
* 08:06 godog: trim thanos raw samples retention to 54w - [[phab:T311690|T311690]]
* 08:04 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2027.codfw.wmnet to cluster codfw and group A
* 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
* 07:52 godog: roll-restart swift-account swift-container across swift/thanos bullseye hosts - [[phab:T297959|T297959]]
* 07:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
* 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:43 taavi@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/PageTriage/includes/HookHandlers/UndeleteHookHandler.php: Backport: [[gerrit:812532{{!}}UndeleteHookHandler: fix namespace conditional (T311347)]] (duration: 02m 54s)
* 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2027.codfw.wmnet with OS bullseye
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2080 from dbtcl [[phab:T312618|T312618]]', diff saved to https://phabricator.wikimedia.org/P30999 and previous config saved to /var/cache/conftool/dbconfig/20220711-073346-marostegui.json
* 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2080.codfw.wmnet
* 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2027.codfw.wmnet with reason: host reimage
* 07:26 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 07:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2027.codfw.wmnet with reason: host reimage
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2080.codfw.wmnet
* 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2027.codfw.wmnet with OS bullseye
* 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2077.codfw.wmnet
* 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:54 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2077.codfw.wmnet
* 06:28 _joe_: repool thumbor1005
* 06:28 _joe_: depooled thumbor1005, downgraded firejail, restarted units
* 00:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
 
== 2022-07-10 ==
* 13:48 godog: silence ProbeDown pages for thumbor:8800 until wed
 
== 2022-07-09 ==
* 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:48 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: {{Gerrit|I3e43b10d26858c5b}} (duration: 03m 37s)
* 01:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:35 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I1bb97d1d601}} (duration: 03m 24s)
* 01:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
 
== 2022-07-08 ==
* 21:44 ryankemper: [Elastic] Reshuffled shards on eqiad to get cluster back into green status (from yellow): https://phabricator.wikimedia.org/P30995#130117
* 21:32 ori: apt1001: reprepro -C main include buster-wikimedia libvmod-querysort_0.2_amd64.changes
* 19:58 thcipriani: quick phab downtime for deploy to fix [[phab:T312614|T312614]]
* 19:57 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 19:57 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 19:57 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 19:56 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 19:56 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 19:56 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 19:49 tzatziki: removing 2 files for legal compliance
* 18:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1001.wikimedia.org with OS bullseye
* 18:26 urandom: changing Cassandra superuser password, AQS cluster -- [[phab:T311652|T311652]]
* 18:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage
* 18:18 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage
* 18:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1001.wikimedia.org with OS bullseye
* 16:25 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:29 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:27 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:15 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:59 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1004.wikimedia.org with OS bullseye
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30990 and previous config saved to /var/cache/conftool/dbconfig/20220708-143411-root.json
* 14:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage
* 14:22 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30983 and previous config saved to /var/cache/conftool/dbconfig/20220708-141907-root.json
* 14:11 hashar@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: AddImage: Only process metadata for a single valid suggestion - [[phab:T312544|T312544]] (duration: 03m 25s)
* 14:09 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1004.wikimedia.org with OS bullseye
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30978 and previous config saved to /var/cache/conftool/dbconfig/20220708-140404-root.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30975 and previous config saved to /var/cache/conftool/dbconfig/20220708-134900-root.json
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30974 and previous config saved to /var/cache/conftool/dbconfig/20220708-133356-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30973 and previous config saved to /var/cache/conftool/dbconfig/20220708-131852-root.json
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30971 and previous config saved to /var/cache/conftool/dbconfig/20220708-130348-root.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30970 and previous config saved to /var/cache/conftool/dbconfig/20220708-124844-root.json
* 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deneb.codfw.wmnet
* 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts deneb.codfw.wmnet
* 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage
* 09:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2016.codfw.wmnet to cluster codfw and group D
* 07:33 akosiaris: reboot rdb1009 for kernel upgrades
* 07:29 vgutierrez: restart  pybal on lvs6002
* 07:22 akosiaris: reboot rdb1010 for kernel upgrades
* 06:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2016.codfw.wmnet to cluster codfw and group D
* 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
* 06:47 TimStarling: on mwmaint2002: using iptables to simulate cross-DC memcached traffic loss
* 06:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
* 06:05 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Switch $wgCentralAuthTokenCacheType to mcrouter-primary-dc (duration: 03m 18s)
* 06:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2016.codfw.wmnet with OS bullseye
* 06:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2077 from dbctl [[phab:T312191|T312191]]', diff saved to https://phabricator.wikimedia.org/P30963 and previous config saved to /var/cache/conftool/dbconfig/20220708-055334-marostegui.json
* 05:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2016.codfw.wmnet with reason: host reimage
* 05:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2016.codfw.wmnet with reason: host reimage
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2076.codfw.wmnet
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2076.codfw.wmnet
* 05:31 moritzm: draining ganeti2027 [[phab:T311686|T311686]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2076 from dbctl [[phab:T312190|T312190]]', diff saved to https://phabricator.wikimedia.org/P30962 and previous config saved to /var/cache/conftool/dbconfig/20220708-052926-marostegui.json
* 05:26 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2016.codfw.wmnet with OS bullseye
* 05:23 marostegui: dbmaint s3@eqiad [[phab:T312574|T312574]]
* 04:08 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@b5d49fe]: use mode=reschedule on all airflow sensors (duration: 02m 03s)
* 04:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@b5d49fe]: use mode=reschedule on all airflow sensors
* 03:33 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 03:22 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1004.wikimedia.org with OS bullseye
* 02:27 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@c271774]: Update rdf-spark-tools to 0.3.112 (duration: 02m 13s)
* 02:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1004.wikimedia.org with OS bullseye
* 02:25 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 02:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@c271774]: Update rdf-spark-tools to 0.3.112
* 02:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: RL use MainStash on dewiki {{Gerrit|I1c120d64d226}} (duration: 03m 21s)
* 01:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS bullseye
* 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
* 01:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
* 01:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS bullseye
* 01:12 mutante: gitlab1004 - _still_ icinga alerts about rsync to decom'ed host. 'systemctl daemon-reload' to teach it about deleted units, then systemctl reset failed ..then RECOVERY [[phab:T307142|T307142]]
* 00:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2181.codfw.wmnet with OS bullseye
 
== 2022-07-07 ==
* 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: host reimage
* 23:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: host reimage
* 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2180.codfw.wmnet with OS bullseye
* 23:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: host reimage
* 23:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2181.codfw.wmnet with OS bullseye
* 23:26 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I9b97f79618}} (duration: 03m 23s)
* 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2180.codfw.wmnet with reason: host reimage
* 23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2179.codfw.wmnet with OS bullseye
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
* 22:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
* 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:56 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I1f2daab316}} (duration: 03m 43s)
* 22:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2178.codfw.wmnet with reason: host reimage
* 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2178.codfw.wmnet with reason: host reimage
* 22:42 krinkle@deploy1002: Synchronized wmf-config/missing.php: {{Gerrit|I13a4ba0e307a}} (duration: 03m 33s)
* 22:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2179.codfw.wmnet with OS bullseye
* 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2177.codfw.wmnet with OS bullseye
* 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2178.codfw.wmnet with OS bullseye
* 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2176.codfw.wmnet with OS bullseye
* 22:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
* 22:17 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
* 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2176.codfw.wmnet with reason: host reimage
* 21:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2177.codfw.wmnet with OS bullseye
* 21:33 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ice5302f791fb1d5}} (duration: 03m 18s)
* 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:28 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|Ice5302f791fb1d5}} (duration: 03m 18s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2177.codfw.wmnet with OS bullseye
* 20:55 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:812017{{!}}Migrate WikibaseTermboxInteraction from EventLogging to EventGate on testwiki (T290303)]] (duration: 03m 12s)
* 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e0a8f03]: tune subgraph_mapping_weekly based on first prod run (duration: 02m 05s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:49 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e0a8f03]: tune subgraph_mapping_weekly based on first prod run
* 20:49 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.19/includes/parser/ParserOutput.php: Backport: [[gerrit:811961{{!}}ParserOutput::mergeMapStrategy: don't crash if merging non-array values (T312242)]] (duration: 03m 05s)
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:38 thcipriani@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: Config: [[gerrit:791729{{!}}Enable VisualEditor on thwikibooks by default (T308379)]] (duration: 03m 13s)
* 20:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2176.codfw.wmnet with OS bullseye
* 20:34 thcipriani@deploy1002: Synchronized wmf-config/config/thwikibooks.yaml: Config: [[gerrit:791729{{!}}Enable VisualEditor on thwikibooks by default (T308379)]] (duration: 03m 25s)
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2181.mgmt.codfw.wmnet with reboot policy FORCED
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2182.mgmt.codfw.wmnet with reboot policy FORCED
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudweb1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudweb1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc1001.eqiad.wmnet
* 20:11 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:08 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 20:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:03 mutante: destroying former strech backend of doc.wikimedia.org, replaced by doc1002 on buster ([[phab:T247653|T247653]])
* 20:03 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts doc1001.eqiad.wmnet
* 20:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.wikimedia.org with OS bullseye
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2182.mgmt.codfw.wmnet with reboot policy FORCED
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2181.mgmt.codfw.wmnet with reboot policy FORCED
* 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2180.mgmt.codfw.wmnet with reboot policy FORCED
* 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2179.mgmt.codfw.wmnet with reboot policy FORCED
* 19:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.wikimedia.org with reason: host reimage
* 19:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 19:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 19:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.wikimedia.org with reason: host reimage
* 19:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 19:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.wikimedia.org with OS bullseye
* 19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 19:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 19:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
* 19:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:10 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:07 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 19:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.wikimedia.org with reason: host reimage
* 19:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2180.mgmt.codfw.wmnet with reboot policy FORCED
* 18:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.wikimedia.org with reason: host reimage
* 18:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2179.mgmt.codfw.wmnet with reboot policy FORCED
* 18:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2178.mgmt.codfw.wmnet with reboot policy FORCED
* 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.wikimedia.org with OS bullseye
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 18:46 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1006.wikimedia.org with OS bullseye
* 18:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
* 18:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 18:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 18:26 brett@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2178.mgmt.codfw.wmnet with reboot policy FORCED
* 18:22 brett@cumin1001: START - Cookbook sre.dns.netbox
* 18:22 brett@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
* 18:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:16 brett@cumin1001: START - Cookbook sre.dns.netbox
* 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
* 18:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:02 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
* 17:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
* 17:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
* 17:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:53 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:51 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 17:51 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 17:39 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 17:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
* 17:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
* 17:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1002.wikimedia.org with OS bullseye
* 17:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1006.wikimedia.org with OS bullseye
* 17:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
* 17:12 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:12 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:11 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:11 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:10 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:10 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1002.wikimedia.org with reason: host reimage
* 17:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1002.wikimedia.org with reason: host reimage
* 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1005.wikimedia.org with OS bullseye
* 16:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 16:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 16:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1002.wikimedia.org with OS bullseye
* 16:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
* 16:48 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 16:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.wikimedia.org with OS bullseye
* 16:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.wikimedia.org with reason: host reimage
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.wikimedia.org with reason: host reimage
* 16:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1001.wikimedia.org with OS bullseye
* 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 16:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1003.wikimedia.org with OS bullseye
* 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 16:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.wikimedia.org with OS bullseye
* 16:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
* 16:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1003.wikimedia.org with reason: host reimage
* 16:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
* 16:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 16:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1003.wikimedia.org with reason: host reimage
* 16:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
* 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bullseye
* 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30959 and previous config saved to /var/cache/conftool/dbconfig/20220707-160308-root.json
* 16:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
* 16:01 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
* 16:01 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
* 15:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
* 15:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30958 and previous config saved to /var/cache/conftool/dbconfig/20220707-154804-root.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30957 and previous config saved to /var/cache/conftool/dbconfig/20220707-153300-root.json
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30956 and previous config saved to /var/cache/conftool/dbconfig/20220707-151756-root.json
* 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2016.codfw.wmnet with reason: Drop from ganeti cluster for eventual reimage
* 15:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2016.codfw.wmnet with reason: Drop from ganeti cluster for eventual reimage
* 15:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2010.codfw.wmnet to cluster codfw and group C
* 15:09 moritzm: installing containerd security updates
* 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30955 and previous config saved to /var/cache/conftool/dbconfig/20220707-150252-root.json
* 14:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to cluster codfw and group C
* 14:54 reedy@deploy1002: Synchronized composer.json: Cleanup (duration: 03m 19s)
* 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
* 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30953 and previous config saved to /var/cache/conftool/dbconfig/20220707-144748-root.json
* 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
* 14:41 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2010.codfw.wmnet to cluster codfw and group C
* 14:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to cluster codfw and group C
* 14:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30952 and previous config saved to /var/cache/conftool/dbconfig/20220707-143244-root.json
* 14:28 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye