You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31225 and previous config saved to /var/cache/conftool/dbconfig/20220717-004804-ladsgroup.json)
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31256 and previous config saved to /var/cache/conftool/dbconfig/20220717-180539-ladsgroup.json)
Line 1: Line 1:
== 2022-07-17 ==
== 2022-07-17 ==
* 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31256 and previous config saved to /var/cache/conftool/dbconfig/20220717-180539-ladsgroup.json
* 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31255 and previous config saved to /var/cache/conftool/dbconfig/20220717-175034-ladsgroup.json
* 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31254 and previous config saved to /var/cache/conftool/dbconfig/20220717-173528-ladsgroup.json
* 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31253 and previous config saved to /var/cache/conftool/dbconfig/20220717-172023-ladsgroup.json
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31252 and previous config saved to /var/cache/conftool/dbconfig/20220717-155102-ladsgroup.json
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31251 and previous config saved to /var/cache/conftool/dbconfig/20220717-155025-ladsgroup.json
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31250 and previous config saved to /var/cache/conftool/dbconfig/20220717-153520-ladsgroup.json
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31249 and previous config saved to /var/cache/conftool/dbconfig/20220717-152015-ladsgroup.json
* 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31248 and previous config saved to /var/cache/conftool/dbconfig/20220717-150510-ladsgroup.json
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31247 and previous config saved to /var/cache/conftool/dbconfig/20220717-132751-ladsgroup.json
* 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 13:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31246 and previous config saved to /var/cache/conftool/dbconfig/20220717-132731-ladsgroup.json
* 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31245 and previous config saved to /var/cache/conftool/dbconfig/20220717-131226-ladsgroup.json
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31244 and previous config saved to /var/cache/conftool/dbconfig/20220717-125720-ladsgroup.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31243 and previous config saved to /var/cache/conftool/dbconfig/20220717-124215-ladsgroup.json
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31242 and previous config saved to /var/cache/conftool/dbconfig/20220717-110523-ladsgroup.json
* 11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31241 and previous config saved to /var/cache/conftool/dbconfig/20220717-110503-ladsgroup.json
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31240 and previous config saved to /var/cache/conftool/dbconfig/20220717-104958-ladsgroup.json
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31239 and previous config saved to /var/cache/conftool/dbconfig/20220717-103453-ladsgroup.json
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31238 and previous config saved to /var/cache/conftool/dbconfig/20220717-101948-ladsgroup.json
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31237 and previous config saved to /var/cache/conftool/dbconfig/20220717-084432-ladsgroup.json
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31236 and previous config saved to /var/cache/conftool/dbconfig/20220717-084411-ladsgroup.json
* 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31235 and previous config saved to /var/cache/conftool/dbconfig/20220717-082906-ladsgroup.json
* 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31234 and previous config saved to /var/cache/conftool/dbconfig/20220717-081401-ladsgroup.json
* 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31233 and previous config saved to /var/cache/conftool/dbconfig/20220717-075856-ladsgroup.json
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31232 and previous config saved to /var/cache/conftool/dbconfig/20220717-071149-ladsgroup.json
* 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31231 and previous config saved to /var/cache/conftool/dbconfig/20220717-071129-ladsgroup.json
* 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31230 and previous config saved to /var/cache/conftool/dbconfig/20220717-065624-ladsgroup.json
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31229 and previous config saved to /var/cache/conftool/dbconfig/20220717-064119-ladsgroup.json
* 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31228 and previous config saved to /var/cache/conftool/dbconfig/20220717-062614-ladsgroup.json
* 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31227 and previous config saved to /var/cache/conftool/dbconfig/20220717-044802-ladsgroup.json
* 04:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 04:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
* 04:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
* 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 02:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 01:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 01:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31226 and previous config saved to /var/cache/conftool/dbconfig/20220717-010309-ladsgroup.json
* 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31225 and previous config saved to /var/cache/conftool/dbconfig/20220717-004804-ladsgroup.json
* 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31225 and previous config saved to /var/cache/conftool/dbconfig/20220717-004804-ladsgroup.json
* 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31224 and previous config saved to /var/cache/conftool/dbconfig/20220717-003259-ladsgroup.json
* 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31224 and previous config saved to /var/cache/conftool/dbconfig/20220717-003259-ladsgroup.json

Revision as of 18:05, 17 July 2022

2022-07-17

  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31256 and previous config saved to /var/cache/conftool/dbconfig/20220717-180539-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31255 and previous config saved to /var/cache/conftool/dbconfig/20220717-175034-ladsgroup.json
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31254 and previous config saved to /var/cache/conftool/dbconfig/20220717-173528-ladsgroup.json
  • 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31253 and previous config saved to /var/cache/conftool/dbconfig/20220717-172023-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31252 and previous config saved to /var/cache/conftool/dbconfig/20220717-155102-ladsgroup.json
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31251 and previous config saved to /var/cache/conftool/dbconfig/20220717-155025-ladsgroup.json
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31250 and previous config saved to /var/cache/conftool/dbconfig/20220717-153520-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31249 and previous config saved to /var/cache/conftool/dbconfig/20220717-152015-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31248 and previous config saved to /var/cache/conftool/dbconfig/20220717-150510-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31247 and previous config saved to /var/cache/conftool/dbconfig/20220717-132751-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312984)', diff saved to https://phabricator.wikimedia.org/P31246 and previous config saved to /var/cache/conftool/dbconfig/20220717-132731-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31245 and previous config saved to /var/cache/conftool/dbconfig/20220717-131226-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31244 and previous config saved to /var/cache/conftool/dbconfig/20220717-125720-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312984)', diff saved to https://phabricator.wikimedia.org/P31243 and previous config saved to /var/cache/conftool/dbconfig/20220717-124215-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T312984)', diff saved to https://phabricator.wikimedia.org/P31242 and previous config saved to /var/cache/conftool/dbconfig/20220717-110523-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31241 and previous config saved to /var/cache/conftool/dbconfig/20220717-110503-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31240 and previous config saved to /var/cache/conftool/dbconfig/20220717-104958-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31239 and previous config saved to /var/cache/conftool/dbconfig/20220717-103453-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31238 and previous config saved to /var/cache/conftool/dbconfig/20220717-101948-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31237 and previous config saved to /var/cache/conftool/dbconfig/20220717-084432-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312984)', diff saved to https://phabricator.wikimedia.org/P31236 and previous config saved to /var/cache/conftool/dbconfig/20220717-084411-ladsgroup.json
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31235 and previous config saved to /var/cache/conftool/dbconfig/20220717-082906-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31234 and previous config saved to /var/cache/conftool/dbconfig/20220717-081401-ladsgroup.json
  • 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312984)', diff saved to https://phabricator.wikimedia.org/P31233 and previous config saved to /var/cache/conftool/dbconfig/20220717-075856-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T312984)', diff saved to https://phabricator.wikimedia.org/P31232 and previous config saved to /var/cache/conftool/dbconfig/20220717-071149-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31231 and previous config saved to /var/cache/conftool/dbconfig/20220717-071129-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31230 and previous config saved to /var/cache/conftool/dbconfig/20220717-065624-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31229 and previous config saved to /var/cache/conftool/dbconfig/20220717-064119-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31228 and previous config saved to /var/cache/conftool/dbconfig/20220717-062614-ladsgroup.json
  • 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31227 and previous config saved to /var/cache/conftool/dbconfig/20220717-044802-ladsgroup.json
  • 04:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 01:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31226 and previous config saved to /var/cache/conftool/dbconfig/20220717-010309-ladsgroup.json
  • 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31225 and previous config saved to /var/cache/conftool/dbconfig/20220717-004804-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31224 and previous config saved to /var/cache/conftool/dbconfig/20220717-003259-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31223 and previous config saved to /var/cache/conftool/dbconfig/20220717-001754-ladsgroup.json
  • 00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31222 and previous config saved to /var/cache/conftool/dbconfig/20220717-000143-ladsgroup.json
  • 00:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 00:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance

2022-07-16

  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31221 and previous config saved to /var/cache/conftool/dbconfig/20220716-221808-ladsgroup.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31220 and previous config saved to /var/cache/conftool/dbconfig/20220716-220303-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31219 and previous config saved to /var/cache/conftool/dbconfig/20220716-214758-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31218 and previous config saved to /var/cache/conftool/dbconfig/20220716-213253-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31217 and previous config saved to /var/cache/conftool/dbconfig/20220716-203238-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 10 hosts with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 10 hosts with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T312984)', diff saved to https://phabricator.wikimedia.org/P31216 and previous config saved to /var/cache/conftool/dbconfig/20220716-200803-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P31215 and previous config saved to /var/cache/conftool/dbconfig/20220716-195258-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P31214 and previous config saved to /var/cache/conftool/dbconfig/20220716-193753-ladsgroup.json
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T312984)', diff saved to https://phabricator.wikimedia.org/P31213 and previous config saved to /var/cache/conftool/dbconfig/20220716-192248-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T312984)', diff saved to https://phabricator.wikimedia.org/P31212 and previous config saved to /var/cache/conftool/dbconfig/20220716-184459-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312984)', diff saved to https://phabricator.wikimedia.org/P31211 and previous config saved to /var/cache/conftool/dbconfig/20220716-184428-ladsgroup.json
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31210 and previous config saved to /var/cache/conftool/dbconfig/20220716-182922-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31209 and previous config saved to /var/cache/conftool/dbconfig/20220716-181417-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312984)', diff saved to https://phabricator.wikimedia.org/P31208 and previous config saved to /var/cache/conftool/dbconfig/20220716-175912-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T312984)', diff saved to https://phabricator.wikimedia.org/P31207 and previous config saved to /var/cache/conftool/dbconfig/20220716-174959-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312984)', diff saved to https://phabricator.wikimedia.org/P31205 and previous config saved to /var/cache/conftool/dbconfig/20220716-173811-ladsgroup.json
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31204 and previous config saved to /var/cache/conftool/dbconfig/20220716-172305-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31203 and previous config saved to /var/cache/conftool/dbconfig/20220716-170800-ladsgroup.json
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312984)', diff saved to https://phabricator.wikimedia.org/P31202 and previous config saved to /var/cache/conftool/dbconfig/20220716-165255-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T312984)', diff saved to https://phabricator.wikimedia.org/P31201 and previous config saved to /var/cache/conftool/dbconfig/20220716-163449-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31200 and previous config saved to /var/cache/conftool/dbconfig/20220716-163418-ladsgroup.json
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31199 and previous config saved to /var/cache/conftool/dbconfig/20220716-161913-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31198 and previous config saved to /var/cache/conftool/dbconfig/20220716-160408-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31197 and previous config saved to /var/cache/conftool/dbconfig/20220716-154903-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31196 and previous config saved to /var/cache/conftool/dbconfig/20220716-153647-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31195 and previous config saved to /var/cache/conftool/dbconfig/20220716-153627-ladsgroup.json
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31194 and previous config saved to /var/cache/conftool/dbconfig/20220716-152122-ladsgroup.json
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31193 and previous config saved to /var/cache/conftool/dbconfig/20220716-150616-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31192 and previous config saved to /var/cache/conftool/dbconfig/20220716-145111-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31191 and previous config saved to /var/cache/conftool/dbconfig/20220716-143705-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31190 and previous config saved to /var/cache/conftool/dbconfig/20220716-143645-ladsgroup.json
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31189 and previous config saved to /var/cache/conftool/dbconfig/20220716-142140-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31188 and previous config saved to /var/cache/conftool/dbconfig/20220716-140634-ladsgroup.json
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31187 and previous config saved to /var/cache/conftool/dbconfig/20220716-135129-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312984)', diff saved to https://phabricator.wikimedia.org/P31186 and previous config saved to /var/cache/conftool/dbconfig/20220716-134429-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 00:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS bullseye
  • 00:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2064.codfw.wmnet with reason: host reimage
  • 00:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2064.codfw.wmnet with reason: host reimage
  • 00:13 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS bullseye

2022-07-15

  • 23:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312984)', diff saved to https://phabricator.wikimedia.org/P31185 and previous config saved to /var/cache/conftool/dbconfig/20220715-231400-ladsgroup.json
  • 22:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31184 and previous config saved to /var/cache/conftool/dbconfig/20220715-225855-ladsgroup.json
  • 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P31183 and previous config saved to /var/cache/conftool/dbconfig/20220715-224350-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312984)', diff saved to https://phabricator.wikimedia.org/P31182 and previous config saved to /var/cache/conftool/dbconfig/20220715-222845-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T312984)', diff saved to https://phabricator.wikimedia.org/P31181 and previous config saved to /var/cache/conftool/dbconfig/20220715-222427-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312984)', diff saved to https://phabricator.wikimedia.org/P31180 and previous config saved to /var/cache/conftool/dbconfig/20220715-222407-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31179 and previous config saved to /var/cache/conftool/dbconfig/20220715-220902-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P31178 and previous config saved to /var/cache/conftool/dbconfig/20220715-215357-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312984)', diff saved to https://phabricator.wikimedia.org/P31177 and previous config saved to /var/cache/conftool/dbconfig/20220715-213852-ladsgroup.json
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T312984)', diff saved to https://phabricator.wikimedia.org/P31176 and previous config saved to /var/cache/conftool/dbconfig/20220715-213153-ladsgroup.json
  • 21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31175 and previous config saved to /var/cache/conftool/dbconfig/20220715-213133-ladsgroup.json
  • 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31174 and previous config saved to /var/cache/conftool/dbconfig/20220715-211628-ladsgroup.json
  • 21:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS bullseye
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P31173 and previous config saved to /var/cache/conftool/dbconfig/20220715-210122-ladsgroup.json
  • 20:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2063.codfw.wmnet with reason: host reimage
  • 20:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2063.codfw.wmnet with reason: host reimage
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31172 and previous config saved to /var/cache/conftool/dbconfig/20220715-204617-ladsgroup.json
  • 20:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31171 and previous config saved to /var/cache/conftool/dbconfig/20220715-203909-ladsgroup.json
  • 20:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS bullseye
  • 20:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312984)', diff saved to https://phabricator.wikimedia.org/P31170 and previous config saved to /var/cache/conftool/dbconfig/20220715-203849-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31169 and previous config saved to /var/cache/conftool/dbconfig/20220715-202344-ladsgroup.json
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P31168 and previous config saved to /var/cache/conftool/dbconfig/20220715-200839-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312984)', diff saved to https://phabricator.wikimedia.org/P31167 and previous config saved to /var/cache/conftool/dbconfig/20220715-195334-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T312984)', diff saved to https://phabricator.wikimedia.org/P31166 and previous config saved to /var/cache/conftool/dbconfig/20220715-194418-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31165 and previous config saved to /var/cache/conftool/dbconfig/20220715-194358-ladsgroup.json
  • 19:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS bullseye
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31164 and previous config saved to /var/cache/conftool/dbconfig/20220715-192852-ladsgroup.json
  • 19:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2062.codfw.wmnet with reason: host reimage
  • 19:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2062.codfw.wmnet with reason: host reimage
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P31163 and previous config saved to /var/cache/conftool/dbconfig/20220715-191347-ladsgroup.json
  • 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS bullseye
  • 19:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2061.codfw.wmnet with OS bullseye
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31162 and previous config saved to /var/cache/conftool/dbconfig/20220715-185842-ladsgroup.json
  • 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31161 and previous config saved to /var/cache/conftool/dbconfig/20220715-185107-ladsgroup.json
  • 18:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31160 and previous config saved to /var/cache/conftool/dbconfig/20220715-185047-ladsgroup.json
  • 18:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2061.codfw.wmnet with reason: host reimage
  • 18:44 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2061.codfw.wmnet with reason: host reimage
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31159 and previous config saved to /var/cache/conftool/dbconfig/20220715-183542-ladsgroup.json
  • 18:31 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2061.codfw.wmnet with OS bullseye
  • 18:30 ryankemper: T300943 Re-imaging `elastic20[61-72]` from buster -> bullseye, one host at a time. These hosts are not in service currently so re-imaging is safe.
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P31158 and previous config saved to /var/cache/conftool/dbconfig/20220715-182037-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31157 and previous config saved to /var/cache/conftool/dbconfig/20220715-180532-ladsgroup.json
  • 18:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1004.wikimedia.org with OS bullseye
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T312984)', diff saved to https://phabricator.wikimedia.org/P31156 and previous config saved to /var/cache/conftool/dbconfig/20220715-175822-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312984)', diff saved to https://phabricator.wikimedia.org/P31155 and previous config saved to /var/cache/conftool/dbconfig/20220715-175801-ladsgroup.json
  • 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1003.wikimedia.org with OS bullseye
  • 17:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
  • 17:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31154 and previous config saved to /var/cache/conftool/dbconfig/20220715-174256-ladsgroup.json
  • 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS bullseye
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P31152 and previous config saved to /var/cache/conftool/dbconfig/20220715-172751-ladsgroup.json
  • 17:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312984)', diff saved to https://phabricator.wikimedia.org/P31151 and previous config saved to /var/cache/conftool/dbconfig/20220715-171246-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T312984)', diff saved to https://phabricator.wikimedia.org/P31150 and previous config saved to /var/cache/conftool/dbconfig/20220715-170545-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 6 hosts with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 6 hosts with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31149 and previous config saved to /var/cache/conftool/dbconfig/20220715-155021-ladsgroup.json
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31148 and previous config saved to /var/cache/conftool/dbconfig/20220715-153515-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P31147 and previous config saved to /var/cache/conftool/dbconfig/20220715-152010-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31146 and previous config saved to /var/cache/conftool/dbconfig/20220715-150505-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T312984)', diff saved to https://phabricator.wikimedia.org/P31144 and previous config saved to /var/cache/conftool/dbconfig/20220715-140451-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312984)', diff saved to https://phabricator.wikimedia.org/P31143 and previous config saved to /var/cache/conftool/dbconfig/20220715-140431-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31141 and previous config saved to /var/cache/conftool/dbconfig/20220715-134926-ladsgroup.json
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P31140 and previous config saved to /var/cache/conftool/dbconfig/20220715-133421-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312984)', diff saved to https://phabricator.wikimedia.org/P31139 and previous config saved to /var/cache/conftool/dbconfig/20220715-131916-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T312984)', diff saved to https://phabricator.wikimedia.org/P31138 and previous config saved to /var/cache/conftool/dbconfig/20220715-130706-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312984)', diff saved to https://phabricator.wikimedia.org/P31137 and previous config saved to /var/cache/conftool/dbconfig/20220715-130634-ladsgroup.json
  • 13:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:05 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31136 and previous config saved to /var/cache/conftool/dbconfig/20220715-125129-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P31135 and previous config saved to /var/cache/conftool/dbconfig/20220715-123624-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312984)', diff saved to https://phabricator.wikimedia.org/P31134 and previous config saved to /var/cache/conftool/dbconfig/20220715-122119-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T312984)', diff saved to https://phabricator.wikimedia.org/P31133 and previous config saved to /var/cache/conftool/dbconfig/20220715-120750-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312984)', diff saved to https://phabricator.wikimedia.org/P31132 and previous config saved to /var/cache/conftool/dbconfig/20220715-120713-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31131 and previous config saved to /var/cache/conftool/dbconfig/20220715-115207-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P31130 and previous config saved to /var/cache/conftool/dbconfig/20220715-113702-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312984)', diff saved to https://phabricator.wikimedia.org/P31129 and previous config saved to /var/cache/conftool/dbconfig/20220715-112157-ladsgroup.json
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T312984)', diff saved to https://phabricator.wikimedia.org/P31128 and previous config saved to /var/cache/conftool/dbconfig/20220715-105748-ladsgroup.json
  • 10:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:56 hashar@deploy1002: Finished deploy [integration/docroot@e563641]: Add banan-i18n library (duration: 00m 08s)
  • 10:56 hashar@deploy1002: Started deploy [integration/docroot@e563641]: Add banan-i18n library
  • 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312984)', diff saved to https://phabricator.wikimedia.org/P31127 and previous config saved to /var/cache/conftool/dbconfig/20220715-103513-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31126 and previous config saved to /var/cache/conftool/dbconfig/20220715-102008-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P31125 and previous config saved to /var/cache/conftool/dbconfig/20220715-100503-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312984)', diff saved to https://phabricator.wikimedia.org/P31124 and previous config saved to /var/cache/conftool/dbconfig/20220715-094958-ladsgroup.json
  • 09:38 Amir1: killed refreshLinkRecommendations.php in testwiki (T299021)
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T312984)', diff saved to https://phabricator.wikimedia.org/P31123 and previous config saved to /var/cache/conftool/dbconfig/20220715-093449-ladsgroup.json
  • 09:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:26 moritzm: update thirdparty/node16 to Node 16.16.0
  • 07:26 moritzm: update thirdparty/node14 to Node 14.20.0
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31121 and previous config saved to /var/cache/conftool/dbconfig/20220715-064928-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31120 and previous config saved to /var/cache/conftool/dbconfig/20220715-063424-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31119 and previous config saved to /var/cache/conftool/dbconfig/20220715-061920-root.json
  • 06:08 ryankemper: T311939 Updated list of masters for psi-codfw search to `elastic2027.codfw.wmnet:9700,elastic2029.codfw.wmnet:9700,elastic2054.codfw.wmnet:9700`
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31118 and previous config saved to /var/cache/conftool/dbconfig/20220715-060416-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31117 and previous config saved to /var/cache/conftool/dbconfig/20220715-054912-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31116 and previous config saved to /var/cache/conftool/dbconfig/20220715-053408-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31115 and previous config saved to /var/cache/conftool/dbconfig/20220715-051904-root.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31114 and previous config saved to /var/cache/conftool/dbconfig/20220715-050400-root.json
  • 00:30 TimStarling: on ms-fe1010 restarting swift-proxy

2022-07-14

  • 22:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312984)', diff saved to https://phabricator.wikimedia.org/P31112 and previous config saved to /var/cache/conftool/dbconfig/20220714-221112-ladsgroup.json
  • 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31111 and previous config saved to /var/cache/conftool/dbconfig/20220714-215606-ladsgroup.json
  • 21:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31110 and previous config saved to /var/cache/conftool/dbconfig/20220714-214101-ladsgroup.json
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312984)', diff saved to https://phabricator.wikimedia.org/P31109 and previous config saved to /var/cache/conftool/dbconfig/20220714-212556-ladsgroup.json
  • 21:15 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T312984)', diff saved to https://phabricator.wikimedia.org/P31108 and previous config saved to /var/cache/conftool/dbconfig/20220714-210347-ladsgroup.json
  • 21:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:03 ryankemper: T289135 First host reimage done, manually killed rolling-operation cookbook before the next host reimage so that we can test out https://gerrit.wikimedia.org/r/813979
  • 21:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312984)', diff saved to https://phabricator.wikimedia.org/P31107 and previous config saved to /var/cache/conftool/dbconfig/20220714-210327-ladsgroup.json
  • 21:02 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 20:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2027.codfw.wmnet with OS bullseye
  • 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31106 and previous config saved to /var/cache/conftool/dbconfig/20220714-204822-ladsgroup.json
  • 20:45 thcipriani: utc-late backport window complete
  • 20:45 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CampaignEvents: Backport: CampaignEvents: backport extension for Jul 18 beta deploy (T311752) (duration: 02m 49s)
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 ryankemper: Restarting elastic services `ryankemper@elastic2054:~$ sudo systemctl restart elasticsearch_6@production*`
  • 20:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic2027.codfw.wmnet with reason: host reimage
  • 20:34 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2027.codfw.wmnet with reason: host reimage
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P31105 and previous config saved to /var/cache/conftool/dbconfig/20220714-203317-ladsgroup.json
  • 20:33 ryankemper: [Elastic] `ryankemper@elastic2054:~$ sudo run-puppet-agent` to add 2054 as an eligible master for codfw-psi
  • 20:30 ryankemper: [Elastic] We're working on promoting `elastic2054` to a master to replace `elastic2049` which is in hw failure
  • 20:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudweb1004.wikimedia.org with OS bullseye
  • 20:18 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2027.codfw.wmnet with OS bullseye
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312984)', diff saved to https://phabricator.wikimedia.org/P31104 and previous config saved to /var/cache/conftool/dbconfig/20220714-201812-ladsgroup.json
  • 20:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T312984)', diff saved to https://phabricator.wikimedia.org/P31103 and previous config saved to /var/cache/conftool/dbconfig/20220714-195715-ladsgroup.json
  • 19:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31102 and previous config saved to /var/cache/conftool/dbconfig/20220714-195655-ladsgroup.json
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31100 and previous config saved to /var/cache/conftool/dbconfig/20220714-194150-ladsgroup.json
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P31098 and previous config saved to /var/cache/conftool/dbconfig/20220714-192645-ladsgroup.json
  • 19:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudweb1003.wikimedia.org with OS bullseye
  • 19:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS bullseye
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31097 and previous config saved to /var/cache/conftool/dbconfig/20220714-191140-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31096 and previous config saved to /var/cache/conftool/dbconfig/20220714-182328-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31095 and previous config saved to /var/cache/conftool/dbconfig/20220714-182308-ladsgroup.json
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31094 and previous config saved to /var/cache/conftool/dbconfig/20220714-180803-ladsgroup.json
  • 18:02 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudweb1003.wikimedia.org with OS bullseye
  • 17:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P31093 and previous config saved to /var/cache/conftool/dbconfig/20220714-175258-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31092 and previous config saved to /var/cache/conftool/dbconfig/20220714-173753-ladsgroup.json
  • 17:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:15 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:15 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31091 and previous config saved to /var/cache/conftool/dbconfig/20220714-163953-ladsgroup.json
  • 16:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31090 and previous config saved to /var/cache/conftool/dbconfig/20220714-163933-ladsgroup.json
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31089 and previous config saved to /var/cache/conftool/dbconfig/20220714-162428-ladsgroup.json
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P31088 and previous config saved to /var/cache/conftool/dbconfig/20220714-160923-ladsgroup.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312977)', diff saved to https://phabricator.wikimedia.org/P31087 and previous config saved to /var/cache/conftool/dbconfig/20220714-160846-marostegui.json
  • 16:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31086 and previous config saved to /var/cache/conftool/dbconfig/20220714-155418-ladsgroup.json
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31085 and previous config saved to /var/cache/conftool/dbconfig/20220714-155341-marostegui.json
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31084 and previous config saved to /var/cache/conftool/dbconfig/20220714-153836-marostegui.json
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312977)', diff saved to https://phabricator.wikimedia.org/P31083 and previous config saved to /var/cache/conftool/dbconfig/20220714-152331-marostegui.json
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312977)', diff saved to https://phabricator.wikimedia.org/P31082 and previous config saved to /var/cache/conftool/dbconfig/20220714-152118-marostegui.json
  • 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31081 and previous config saved to /var/cache/conftool/dbconfig/20220714-152040-marostegui.json
  • 15:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: sync
  • 15:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: sync
  • 15:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: sync
  • 15:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: sync
  • 15:13 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
  • 15:13 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
  • 15:12 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b8f66e9]: (no justification provided) (duration: 00m 10s)
  • 15:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b8f66e9]: (no justification provided)
  • 15:10 ejegg: updated payments-wiki from 6a8aa302 to be11fac2
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31080 and previous config saved to /var/cache/conftool/dbconfig/20220714-150535-marostegui.json
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T312984)', diff saved to https://phabricator.wikimedia.org/P31079 and previous config saved to /var/cache/conftool/dbconfig/20220714-145736-ladsgroup.json
  • 14:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312984)', diff saved to https://phabricator.wikimedia.org/P31078 and previous config saved to /var/cache/conftool/dbconfig/20220714-145716-ladsgroup.json
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31077 and previous config saved to /var/cache/conftool/dbconfig/20220714-145030-marostegui.json
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31076 and previous config saved to /var/cache/conftool/dbconfig/20220714-144211-ladsgroup.json
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31075 and previous config saved to /var/cache/conftool/dbconfig/20220714-143525-marostegui.json
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P31074 and previous config saved to /var/cache/conftool/dbconfig/20220714-142706-ladsgroup.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31073 and previous config saved to /var/cache/conftool/dbconfig/20220714-141917-marostegui.json
  • 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:19 papaul: on going PDU maintenance in rack A6 codfw
  • 14:19 papaul: on going PU maintenance in rack A6 codfw
  • 14:18 papaul: on going PU maintenance in rack A6 codfw
  • 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31072 and previous config saved to /var/cache/conftool/dbconfig/20220714-141846-marostegui.json
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312984)', diff saved to https://phabricator.wikimedia.org/P31071 and previous config saved to /var/cache/conftool/dbconfig/20220714-141201-ladsgroup.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31070 and previous config saved to /var/cache/conftool/dbconfig/20220714-140341-marostegui.json
  • 14:02 matthiasmullie: UTC afternoon backport window done
  • 13:53 mlitn@deploy1002: Finished scap: Backport: Improve maint script output & update i18n messages (duration: 16m 05s)
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T312984)', diff saved to https://phabricator.wikimedia.org/P31069 and previous config saved to /var/cache/conftool/dbconfig/20220714-135038-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312984)', diff saved to https://phabricator.wikimedia.org/P31068 and previous config saved to /var/cache/conftool/dbconfig/20220714-135000-ladsgroup.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31067 and previous config saved to /var/cache/conftool/dbconfig/20220714-134836-marostegui.json
  • 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 mlitn@deploy1002: Started scap: Backport: Improve maint script output & update i18n messages
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31065 and previous config saved to /var/cache/conftool/dbconfig/20220714-133455-ladsgroup.json
  • 13:34 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update boosts for weighted_tags (duration: 02m 45s)
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31064 and previous config saved to /var/cache/conftool/dbconfig/20220714-133331-marostegui.json
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31063 and previous config saved to /var/cache/conftool/dbconfig/20220714-133051-marostegui.json
  • 13:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31062 and previous config saved to /var/cache/conftool/dbconfig/20220714-133031-marostegui.json
  • 13:30 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add custommatch search feature config for commons (duration: 02m 58s)
  • 13:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:NewLexemeAlpha on Wikidata and TestWikidata (T306016) (re-sync, config change seemingly not consistently picked up) (duration: 02m 45s)
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P31061 and previous config saved to /var/cache/conftool/dbconfig/20220714-131950-ladsgroup.json
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:15 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:NewLexemeAlpha on Wikidata and TestWikidata (T306016) (duration: 02m 57s)
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31060 and previous config saved to /var/cache/conftool/dbconfig/20220714-131525-marostegui.json
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312984)', diff saved to https://phabricator.wikimedia.org/P31059 and previous config saved to /var/cache/conftool/dbconfig/20220714-130445-ladsgroup.json
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31058 and previous config saved to /var/cache/conftool/dbconfig/20220714-130020-marostegui.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31057 and previous config saved to /var/cache/conftool/dbconfig/20220714-124515-marostegui.json
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T312984)', diff saved to https://phabricator.wikimedia.org/P31056 and previous config saved to /var/cache/conftool/dbconfig/20220714-124321-ladsgroup.json
  • 12:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T312977)', diff saved to https://phabricator.wikimedia.org/P31055 and previous config saved to /var/cache/conftool/dbconfig/20220714-124239-marostegui.json
  • 12:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312977)', diff saved to https://phabricator.wikimedia.org/P31054 and previous config saved to /var/cache/conftool/dbconfig/20220714-124219-marostegui.json
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31053 and previous config saved to /var/cache/conftool/dbconfig/20220714-122714-marostegui.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31052 and previous config saved to /var/cache/conftool/dbconfig/20220714-121209-marostegui.json
  • 12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312977)', diff saved to https://phabricator.wikimedia.org/P31051 and previous config saved to /var/cache/conftool/dbconfig/20220714-115701-marostegui.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T312977)', diff saved to https://phabricator.wikimedia.org/P31050 and previous config saved to /var/cache/conftool/dbconfig/20220714-115448-marostegui.json
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312977)', diff saved to https://phabricator.wikimedia.org/P31049 and previous config saved to /var/cache/conftool/dbconfig/20220714-115316-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31048 and previous config saved to /var/cache/conftool/dbconfig/20220714-113811-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31047 and previous config saved to /var/cache/conftool/dbconfig/20220714-112304-marostegui.json
  • 11:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T312977)', diff saved to https://phabricator.wikimedia.org/P31046 and previous config saved to /var/cache/conftool/dbconfig/20220714-110759-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2164 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P31038 and previous config saved to /var/cache/conftool/dbconfig/20220714-052056-marostegui.json
  • 05:07 AndyRussG: update payments-wiki-staging 10304f69 -> be11fac2
  • 04:32 oblivian@puppetmaster1001: conftool action : edit; selector: name=ReadOnly,scope=codfw
  • 04:25 tstarling@puppetmaster1001: conftool action : edit; selector: name=ReadOnly,scope=codfw
  • 04:23 tstarling@puppetmaster1001: conftool action : get/ReadOnly; selector: name=ReadOnly,scope=codfw
  • 01:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I73fbfee8248c (duration: 02m 56s)
  • 01:09 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I73fbfee8248c (duration: 02m 45s)
  • 01:03 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: Ie11bdf (duration: 02m 55s)
  • 01:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:44 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: Ie11bdf (duration: 02m 55s)
  • 00:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:29 krinkle@deploy1002: Synchronized wmf-config/wikitech.php: Ib539da0c0953 (duration: 02m 47s)
  • 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-07-13

  • 22:17 inflatador: bking@elastic2055 successfully staged NIC firmware updates for elastic2055-2060
  • 22:09 inflatador: bking@elastic2055 staging NIC firmware updates for elastic2055-2060
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:09 Lucas_WMDE: UTC late backport+config window done
  • 21:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable DiscussionTools beta feature at mediawikiwiki (T310960) (duration: 02m 47s)
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:02 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: QuickSurveys: Undeploy 'research-incentive' (T311015) (2/2, beta) (duration: 02m 58s)
  • 20:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: QuickSurveys: Undeploy 'research-incentive' (T311015) (1/2, prod) (duration: 02m 48s)
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/CommentItem.js: Backport: Avoid localized digits in internal timestamps in JS (T312828) (duration: 02m 49s)
  • 20:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2040.codfw.wmnet with OS bullseye
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/extension-list: Config: Undeploy CongressLookup (part 3) (T312894) (duration: 03m 00s)
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:28 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Undeploy CongressLookup (part 2) (T312894) (duration: 02m 53s)
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Undeploy CongressLookup (part 1) (T312894) (duration: 03m 04s)
  • 20:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2040.codfw.wmnet with reason: host reimage
  • 20:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2040.codfw.wmnet with reason: host reimage
  • 19:59 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2040.codfw.wmnet with OS bullseye
  • 18:20 sukhe: upload pdns-recursor_4.6.2-1+wmf11u1 to apt.wm.org (bullseye) - T305589
  • 17:54 sukhe: upload dnsdist_1.7.2-1+wmf11u1 to apt.wm.org (bullseye) - T305589
  • 17:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 17:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:17 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@e58e61d]: (no justification provided) (duration: 00m 10s)
  • 16:17 milimetric@deploy1002: Started deploy [airflow-dags/analytics@e58e61d]: (no justification provided)
  • 15:59 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2040.codfw.wmnet with OS bullseye
  • 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:56 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2040.codfw.wmnet with OS bullseye
  • 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:12 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9edd1ab]: Deploy [airflow-dags/analytics@9edd1ab] (duration: 00m 10s)
  • 15:12 aqu@deploy1002: Started deploy [airflow-dags/analytics@9edd1ab]: Deploy [airflow-dags/analytics@9edd1ab]
  • 15:10 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9edd1ab]: Deploy [airflow-dags/analytics_test@9edd1ab] (duration: 00m 08s)
  • 15:10 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9edd1ab]: Deploy [airflow-dags/analytics_test@9edd1ab]
  • 14:52 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
  • 14:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
  • 14:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@03c1a05]: Deploy [airflow-dags/analytics_test@03c1a05] (duration: 00m 12s)
  • 14:34 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@03c1a05]: Deploy [airflow-dags/analytics_test@03c1a05]
  • 14:19 aqu: Deployed refinery using scap, then deployed onto hdfs
  • 14:11 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
  • 14:08 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67] (duration: 07m 42s)
  • 14:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
  • 14:01 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67]
  • 14:00 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67] (duration: 00m 07s)
  • 14:00 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67]
  • 13:47 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from x1 master', diff saved to https://phabricator.wikimedia.org/P31037 and previous config saved to /var/cache/conftool/dbconfig/20220713-134413-marostegui.json
  • 13:37 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
  • 13:20 Lucas_WMDE: UTC afternoon backport window done
  • 13:20 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host elastic2049.codfw.wmnet
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Configure wgLexemeLexicalCategoryItemIds on Wikidata (T307441) (duration: 02m 45s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Configure $wgBabelCategoryNames on Test Wikidata (T312920) (duration: 02m 51s)
  • 13:05 inflatador: bking@elastic2049 rebooting for read-only fs
  • 13:04 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2049.codfw.wmnet
  • 12:49 damilare: payments-wiki upgraded from 2f95d8b4 to 6a8aa302
  • 12:12 moritzm: draining ganeti2028 T311686
  • 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2018.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2018.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: codfw s8 sanitarium master switch
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: codfw s8 sanitarium master switch
  • 10:42 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67]: Regular analytics weekly train (2nd try. --force) [analytics/refinery@bd39e67] (duration: 04m 52s)
  • 10:38 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67]: Regular analytics weekly train (2nd try. --force) [analytics/refinery@bd39e67]
  • 10:27 moritzm: draining ganeti1028 T311686
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2012.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 10:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2012.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31035 and previous config saved to /var/cache/conftool/dbconfig/20220713-090748-ladsgroup.json
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31034 and previous config saved to /var/cache/conftool/dbconfig/20220713-085244-ladsgroup.json
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31033 and previous config saved to /var/cache/conftool/dbconfig/20220713-083740-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31032 and previous config saved to /var/cache/conftool/dbconfig/20220713-082236-ladsgroup.json
  • 08:05 jayme: 'systemctl restart rsyslog' on kubernetes2007.codfw.wmnet,kubernetes2010.codfw.wmnet,kubernetes2014.codfw.wmnet,kubernetes2020.codfw.wmnet,kubernetes2009.codfw.wmnet
  • 07:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 07:52 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 07:51 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 07:50 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31031 and previous config saved to /var/cache/conftool/dbconfig/20220713-070229-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31030 and previous config saved to /var/cache/conftool/dbconfig/20220713-064725-root.json
  • 06:45 aqu: analytics/refinery deploy aborted, no more space to deploy in /srv on an-launcher1002 eqiad
  • 06:44 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67]: Regular analytics weekly train [analytics/refinery@bd39e67] (duration: 27m 02s)
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31029 and previous config saved to /var/cache/conftool/dbconfig/20220713-063221-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31028 and previous config saved to /var/cache/conftool/dbconfig/20220713-061717-root.json
  • 06:16 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67]: Regular analytics weekly train [analytics/refinery@bd39e67]
  • 06:16 aqu: analytics/refinery deployment
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31027 and previous config saved to /var/cache/conftool/dbconfig/20220713-060213-root.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31026 and previous config saved to /var/cache/conftool/dbconfig/20220713-054709-root.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31025 and previous config saved to /var/cache/conftool/dbconfig/20220713-053205-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31024 and previous config saved to /var/cache/conftool/dbconfig/20220713-051701-root.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2162 in s8 T311493', diff saved to https://phabricator.wikimedia.org/P31023 and previous config saved to /var/cache/conftool/dbconfig/20220713-051239-marostegui.json

2022-07-12

  • 22:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2039.codfw.wmnet with OS bullseye
  • 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec (duration: 02m 04s)
  • 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec
  • 22:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage
  • 22:11 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage
  • 21:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2039.codfw.wmnet with OS bullseye
  • 20:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2038.codfw.wmnet with OS bullseye
  • 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage
  • 20:07 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage
  • 19:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:38 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye
  • 19:35 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:34 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye
  • 19:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye
  • 19:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:30 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye
  • 19:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
  • 19:26 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I3071c009c (2) (duration: 02m 45s)
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:20 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I3071c009c (duration: 03m 09s)
  • 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update T312298
  • 19:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update T312298
  • 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:13 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic1065.eqiad.wmnet
  • 19:13 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic1065.eqiad.wmnet
  • 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2037.codfw.wmnet with OS bullseye
  • 16:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2037.codfw.wmnet with reason: host reimage
  • 16:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2037.codfw.wmnet with reason: host reimage
  • 16:55 bblack: codfw dns repooled for front edge traffic
  • 16:50 herron: ran failed codfw puppet agents
  • 16:47 mutante: doc1002 - systemctl reset-failed
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1026.eqiad.wmnet
  • 16:36 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 16:19 mutante: rebooting mwdebug2001 via ganeti2022
  • 16:15 cwhite: repair networking on people2002
  • 16:11 cwhite: repair networking on puppetdb2002
  • 16:10 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1026.eqiad.wmnet
  • 16:05 mutante: parse200[1-3] - restarted ferm
  • 16:03 mutante: mw2401 through mw2410 - performing ferm restarts (without cumin, has its own issue)
  • 15:57 mutante: mw2405 - restarted ferm
  • 15:50 bblack: codfw dns depooled for front edge traffic
  • 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic1065.eqiad.wmnet with reason: firmware update T312298
  • 15:48 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic1065.eqiad.wmnet with reason: firmware update T312298
  • 15:30 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 15:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 15:06 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 15:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 15:06 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2037.codfw.wmnet with OS bullseye
  • 15:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:02 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 14:56 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 14:52 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:48 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
  • 14:48 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:47 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2037.codfw.wmnet with OS bullseye
  • 14:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on druid1008.eqiad.wmnet with reason: T308331 btullis
  • 14:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on druid1008.eqiad.wmnet with reason: T308331 btullis
  • 14:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
  • 14:30 papaul: on going PDU maintenenace in rack A5
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2037.codfw.wmnet
  • 13:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
  • 13:41 Lucas_WMDE: UTC afternoon backport window done
  • 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/CommentItem.js: Backport: Parse 'DiscussionToolsTimestampFormatSwitchTime' config value as UTC (T312828) (duration: 02m 50s)
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Rack move, T308331
  • 12:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Rack move, T308331
  • 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to x1 master until the replica is back from maintenance', diff saved to https://phabricator.wikimedia.org/P31018 and previous config saved to /var/cache/conftool/dbconfig/20220712-101246-marostegui.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for onsite maintenance T308331', diff saved to https://phabricator.wikimedia.org/P31017 and previous config saved to /var/cache/conftool/dbconfig/20220712-101211-root.json
  • 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 09:12 hashar: Restarted Zuul T309371
  • 08:58 hashar: Restarted Gerrit T309371
  • 08:25 hashar@deploy1002: Finished deploy [integration/docroot@c2cceaf]: Fix NPM URL for Wikimedia language-data library (duration: 00m 08s)
  • 08:25 hashar@deploy1002: Started deploy [integration/docroot@c2cceaf]: Fix NPM URL for Wikimedia language-data library
  • 07:10 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@89cb17d]: subgraph_and_query_mapping: Increase executor memory to 12g, use repartition (duration: 02m 02s)
  • 07:08 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@89cb17d]: subgraph_and_query_mapping: Increase executor memory to 12g, use repartition
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P31014 and previous config saved to /var/cache/conftool/dbconfig/20220712-070240-root.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31013 and previous config saved to /var/cache/conftool/dbconfig/20220712-065352-root.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31012 and previous config saved to /var/cache/conftool/dbconfig/20220712-063848-root.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31011 and previous config saved to /var/cache/conftool/dbconfig/20220712-062344-root.json
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:12 marostegui: dbmaint s3@eqiad T310011
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 T311610', diff saved to https://phabricator.wikimedia.org/P31010 and previous config saved to /var/cache/conftool/dbconfig/20220712-060407-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1157 to s3 primary and set section read-write T311610', diff saved to https://phabricator.wikimedia.org/P31009 and previous config saved to /var/cache/conftool/dbconfig/20220712-060058-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T311610', diff saved to https://phabricator.wikimedia.org/P31008 and previous config saved to /var/cache/conftool/dbconfig/20220712-060031-marostegui.json
  • 06:00 marostegui: Starting s3 eqiad failover from db1123 to db1157 - T311610
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1157 with weight 0 T311610', diff saved to https://phabricator.wikimedia.org/P31007 and previous config saved to /var/cache/conftool/dbconfig/20220712-051927-root.json
  • 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: Primary switchover s3 T311610
  • 05:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: Primary switchover s3 T311610
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:10 ejegg: updated payments-wiki from 53a7b7bd to 2f95d8b4

2022-07-11

  • 21:49 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3ba1d4c]: subgraph_query_mapping_daily: Increase partitioning to 2048 (duration: 02m 02s)
  • 21:47 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3ba1d4c]: subgraph_query_mapping_daily: Increase partitioning to 2048
  • 20:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@a559f82]: subgraph: Use HivePartitionRangeSensor to wait for sparql queries (duration: 02m 00s)
  • 20:36 TheresNoTime: UTC late deploys done
  • 20:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@a559f82]: subgraph: Use HivePartitionRangeSensor to wait for sparql queries
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:28 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikibaseTermboxInteraction from EventLogging to EventGate on all wikis (T290303) (duration: 02m 53s)
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I82262e try again ref T311788 (duration: 03m 07s)
  • 19:41 hashar@deploy1002: Finished deploy [integration/docroot@fc5d65a]: Add language-data library (duration: 00m 08s)
  • 19:41 hashar@deploy1002: Started deploy [integration/docroot@fc5d65a]: Add language-data library
  • 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P31005 and previous config saved to /var/cache/conftool/dbconfig/20220711-193315-marostegui.json
  • 18:32 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:10 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@02ab1c2]: use mode=reschedule on all airflow sensors (duration: 02m 02s)
  • 16:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@02ab1c2]: use mode=reschedule on all airflow sensors
  • 16:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 16:11 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I82262e (duration: 02m 55s)
  • 16:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:56 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2175.codfw.wmnet with OS bullseye
  • 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1005.wikimedia.org with reason: host reimage
  • 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1005.wikimedia.org with reason: host reimage
  • 15:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:42 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 51s)
  • 15:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2175.codfw.wmnet with reason: host reimage
  • 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 58s)
  • 15:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2175.codfw.wmnet with reason: host reimage
  • 15:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:27 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:23 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2175.codfw.wmnet with OS bullseye
  • 15:08 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:34 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:11 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:54 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:53 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:53 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:53 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2163 to s8 T311493', diff saved to https://phabricator.wikimedia.org/P31002 and previous config saved to /var/cache/conftool/dbconfig/20220711-130441-marostegui.json
  • 12:05 moritzm: updated bullseye netboot image for Bullseye 11.4 point release T312637
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AniketArs out of all services on: 1292 hosts
  • 10:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AniketArs out of all services on: 1292 hosts
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AniketArs out of all services on: 663 hosts
  • 10:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AniketArs out of all services on: 663 hosts
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2027.codfw.wmnet to cluster codfw and group A
  • 08:06 godog: trim thanos raw samples retention to 54w - T311690
  • 08:04 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2027.codfw.wmnet to cluster codfw and group A
  • 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 07:52 godog: roll-restart swift-account swift-container across swift/thanos bullseye hosts - T297959
  • 07:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:43 taavi@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/PageTriage/includes/HookHandlers/UndeleteHookHandler.php: Backport: UndeleteHookHandler: fix namespace conditional (T311347) (duration: 02m 54s)
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2027.codfw.wmnet with OS bullseye
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2080 from dbtcl T312618', diff saved to https://phabricator.wikimedia.org/P30999 and previous config saved to /var/cache/conftool/dbconfig/20220711-073346-marostegui.json
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2080.codfw.wmnet
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2027.codfw.wmnet with reason: host reimage
  • 07:26 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2027.codfw.wmnet with reason: host reimage
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2080.codfw.wmnet
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2027.codfw.wmnet with OS bullseye
  • 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2077.codfw.wmnet
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:54 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2077.codfw.wmnet
  • 06:28 _joe_: repool thumbor1005
  • 06:28 _joe_: depooled thumbor1005, downgraded firejail, restarted units
  • 00:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply

2022-07-10

  • 13:48 godog: silence ProbeDown pages for thumbor:8800 until wed

2022-07-09

  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:48 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: I3e43b1 (duration: 03m 37s)
  • 01:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:35 krinkle@deploy1002: Synchronized wmf-config/: I1bb97d1d601 (duration: 03m 24s)
  • 01:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-07-08

  • 21:44 ryankemper: [Elastic] Reshuffled shards on eqiad to get cluster back into green status (from yellow): https://phabricator.wikimedia.org/P30995#130117
  • 21:32 ori: apt1001: reprepro -C main include buster-wikimedia libvmod-querysort_0.2_amd64.changes
  • 19:58 thcipriani: quick phab downtime for deploy to fix T312614
  • 19:57 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
  • 19:57 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
  • 19:57 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
  • 19:56 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
  • 19:56 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
  • 19:56 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
  • 19:49 tzatziki: removing 2 files for legal compliance
  • 18:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1001.wikimedia.org with OS bullseye
  • 18:26 urandom: changing Cassandra superuser password, AQS cluster -- T311652
  • 18:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage
  • 18:18 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage
  • 18:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1001.wikimedia.org with OS bullseye
  • 16:25 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:29 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:27 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:15 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:59 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
  • 14:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1004.wikimedia.org with OS bullseye
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30990 and previous config saved to /var/cache/conftool/dbconfig/20220708-143411-root.json
  • 14:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage
  • 14:22 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30983 and previous config saved to /var/cache/conftool/dbconfig/20220708-141907-root.json
  • 14:11 hashar@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: AddImage: Only process metadata for a single valid suggestion - T312544 (duration: 03m 25s)
  • 14:09 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1004.wikimedia.org with OS bullseye
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30978 and previous config saved to /var/cache/conftool/dbconfig/20220708-140404-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30975 and previous config saved to /var/cache/conftool/dbconfig/20220708-134900-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30974 and previous config saved to /var/cache/conftool/dbconfig/20220708-133356-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30973 and previous config saved to /var/cache/conftool/dbconfig/20220708-131852-root.json
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30971 and previous config saved to /var/cache/conftool/dbconfig/20220708-130348-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30970 and previous config saved to /var/cache/conftool/dbconfig/20220708-124844-root.json
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deneb.codfw.wmnet
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts deneb.codfw.wmnet
  • 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage
  • 09:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2016.codfw.wmnet to cluster codfw and group D
  • 07:33 akosiaris: reboot rdb1009 for kernel upgrades
  • 07:29 vgutierrez: restart pybal on lvs6002
  • 07:22 akosiaris: reboot rdb1010 for kernel upgrades
  • 06:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2016.codfw.wmnet to cluster codfw and group D
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
  • 06:47 TimStarling: on mwmaint2002: using iptables to simulate cross-DC memcached traffic loss
  • 06:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
  • 06:05 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Switch $wgCentralAuthTokenCacheType to mcrouter-primary-dc (duration: 03m 18s)
  • 06:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2016.codfw.wmnet with OS bullseye
  • 06:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2077 from dbctl T312191', diff saved to https://phabricator.wikimedia.org/P30963 and previous config saved to /var/cache/conftool/dbconfig/20220708-055334-marostegui.json
  • 05:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2016.codfw.wmnet with reason: host reimage
  • 05:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2016.codfw.wmnet with reason: host reimage
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2076.codfw.wmnet
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2076.codfw.wmnet
  • 05:31 moritzm: draining ganeti2027 T311686
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2076 from dbctl T312190', diff saved to https://phabricator.wikimedia.org/P30962 and previous config saved to /var/cache/conftool/dbconfig/20220708-052926-marostegui.json
  • 05:26 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2016.codfw.wmnet with OS bullseye
  • 05:23 marostegui: dbmaint s3@eqiad T312574
  • 04:08 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@b5d49fe]: use mode=reschedule on all airflow sensors (duration: 02m 03s)
  • 04:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@b5d49fe]: use mode=reschedule on all airflow sensors
  • 03:33 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 03:22 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1004.wikimedia.org with OS bullseye
  • 02:27 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@c271774]: Update rdf-spark-tools to 0.3.112 (duration: 02m 13s)
  • 02:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1004.wikimedia.org with OS bullseye
  • 02:25 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 02:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@c271774]: Update rdf-spark-tools to 0.3.112
  • 02:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: RL use MainStash on dewiki I1c120d64d226 (duration: 03m 21s)
  • 01:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS bullseye
  • 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
  • 01:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
  • 01:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS bullseye
  • 01:12 mutante: gitlab1004 - _still_ icinga alerts about rsync to decom'ed host. 'systemctl daemon-reload' to teach it about deleted units, then systemctl reset failed ..then RECOVERY T307142
  • 00:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2181.codfw.wmnet with OS bullseye

2022-07-07

  • 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: host reimage
  • 23:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: host reimage
  • 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2180.codfw.wmnet with OS bullseye
  • 23:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: host reimage
  • 23:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2181.codfw.wmnet with OS bullseye
  • 23:26 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I9b97f79618 (duration: 03m 23s)
  • 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2180.codfw.wmnet with reason: host reimage
  • 23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2179.codfw.wmnet with OS bullseye
  • 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
  • 22:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
  • 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:56 krinkle@deploy1002: Synchronized multiversion/: I1f2daab316 (duration: 03m 43s)
  • 22:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2178.codfw.wmnet with reason: host reimage
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2178.codfw.wmnet with reason: host reimage
  • 22:42 krinkle@deploy1002: Synchronized wmf-config/missing.php: I13a4ba0e307a (duration: 03m 33s)
  • 22:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2179.codfw.wmnet with OS bullseye
  • 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2177.codfw.wmnet with OS bullseye
  • 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2178.codfw.wmnet with OS bullseye
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2176.codfw.wmnet with OS bullseye
  • 22:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
  • 22:17 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2176.codfw.wmnet with reason: host reimage
  • 21:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2177.codfw.wmnet with OS bullseye
  • 21:33 krinkle@deploy1002: Synchronized multiversion/: Ice5302 (duration: 03m 18s)
  • 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:28 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: Ice5302 (duration: 03m 18s)
  • 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2177.codfw.wmnet with OS bullseye
  • 20:55 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikibaseTermboxInteraction from EventLogging to EventGate on testwiki (T290303) (duration: 03m 12s)
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e0a8f03]: tune subgraph_mapping_weekly based on first prod run (duration: 02m 05s)
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:49 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e0a8f03]: tune subgraph_mapping_weekly based on first prod run
  • 20:49 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.19/includes/parser/ParserOutput.php: Backport: ParserOutput::mergeMapStrategy: don't crash if merging non-array values (T312242) (duration: 03m 05s)
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:38 thcipriani@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: Config: Enable VisualEditor on thwikibooks by default (T308379) (duration: 03m 13s)
  • 20:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2176.codfw.wmnet with OS bullseye
  • 20:34 thcipriani@deploy1002: Synchronized wmf-config/config/thwikibooks.yaml: Config: Enable VisualEditor on thwikibooks by default (T308379) (duration: 03m 25s)
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1012.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2181.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2182.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1012.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudweb1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudweb1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc1001.eqiad.wmnet
  • 20:11 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:08 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:03 mutante: destroying former strech backend of doc.wikimedia.org, replaced by doc1002 on buster (T247653)
  • 20:03 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts doc1001.eqiad.wmnet
  • 20:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2182.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2181.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2180.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2179.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.wikimedia.org with reason: host reimage
  • 19:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.wikimedia.org with reason: host reimage
  • 19:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.wikimedia.org with OS bullseye
  • 19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 19:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:10 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:07 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.wikimedia.org with reason: host reimage
  • 19:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2180.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.wikimedia.org with reason: host reimage
  • 18:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2179.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2178.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.wikimedia.org with OS bullseye
  • 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 18:46 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 18:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 18:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 18:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 18:26 brett@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2178.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:22 brett@cumin1001: START - Cookbook sre.dns.netbox
  • 18:22 brett@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:16 brett@cumin1001: START - Cookbook sre.dns.netbox
  • 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:02 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:53 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:51 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:51 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:39 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1002.wikimedia.org with OS bullseye
  • 17:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 17:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
  • 17:12 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1002.wikimedia.org with reason: host reimage
  • 17:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1002.wikimedia.org with reason: host reimage
  • 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1005.wikimedia.org with OS bullseye
  • 16:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1002.wikimedia.org with OS bullseye
  • 16:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 16:48 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 16:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.wikimedia.org with OS bullseye
  • 16:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.wikimedia.org with reason: host reimage
  • 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.wikimedia.org with reason: host reimage
  • 16:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 16:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 16:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.wikimedia.org with OS bullseye
  • 16:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
  • 16:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1003.wikimedia.org with reason: host reimage
  • 16:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
  • 16:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 16:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1003.wikimedia.org with reason: host reimage
  • 16:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bullseye
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30959 and previous config saved to /var/cache/conftool/dbconfig/20220707-160308-root.json
  • 16:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 16:01 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 16:01 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 15:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30958 and previous config saved to /var/cache/conftool/dbconfig/20220707-154804-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30957 and previous config saved to /var/cache/conftool/dbconfig/20220707-153300-root.json
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30956 and previous config saved to /var/cache/conftool/dbconfig/20220707-151756-root.json
  • 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2016.codfw.wmnet with reason: Drop from ganeti cluster for eventual reimage
  • 15:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2016.codfw.wmnet with reason: Drop from ganeti cluster for eventual reimage
  • 15:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2010.codfw.wmnet to cluster codfw and group C
  • 15:09 moritzm: installing containerd security updates
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30955 and previous config saved to /var/cache/conftool/dbconfig/20220707-150252-root.json
  • 14:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to cluster codfw and group C
  • 14:54 reedy@deploy1002: Synchronized composer.json: Cleanup (duration: 03m 19s)
  • 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30953 and previous config saved to /var/cache/conftool/dbconfig/20220707-144748-root.json
  • 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 14:41 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2010.codfw.wmnet to cluster codfw and group C
  • 14:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to cluster codfw and group C
  • 14:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30952 and previous config saved to /var/cache/conftool/dbconfig/20220707-143244-root.json
  • 14:28 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 14:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 14:23 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30951 and previous config saved to /var/cache/conftool/dbconfig/20220707-141740-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1132', diff saved to https://phabricator.wikimedia.org/P30950 and previous config saved to /var/cache/conftool/dbconfig/20220707-141724-marostegui.json
  • 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 14:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 13:49 moritzm: draining ganeti2016 T311686
  • 13:44 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: 95c38bd: ServiceImageRecommendationProvider: Dont fail on first validation error (T312521) (duration: 03m 24s)
  • 13:41 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/Translate/tag/PageTranslationHooks.php: af51745: Translation unit deletion: Skip translation update if it doesnt exist (T312293) (duration: 03m 32s)
  • 13:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:31 urbanecm@deploy1002: Synchronized wmf-config/: aa1d8c8: GrowthExperiments: Set GEImageRecommendationApiHandler (T306032; 2/2) (duration: 03m 37s)
  • 13:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:27 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: aa1d8c8: GrowthExperiments: Set GEImageRecommendationApiHandler (T306032; 1/2) (duration: 03m 20s)
  • 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:24 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: df1393f: ServiceImageRecommendationProvider: Dont fail on first validation error (T312521) (duration: 03m 30s)
  • 13:21 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ganeti2010.codfw.wmnet with OS bullseye
  • 13:21 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 13:20 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:20 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:19 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 elukey: roll restart eventgate-main pods to add a new stream - T301878
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2165 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P30948 and previous config saved to /var/cache/conftool/dbconfig/20220707-131852-marostegui.json
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2010.codfw.wmnet with reason: host reimage
  • 13:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2010.codfw.wmnet with reason: host reimage
  • 12:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS bullseye
  • 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
  • 12:37 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
  • 12:22 moritzm: draining ganeti2015 T311686
  • 11:53 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:49 jayme: rolling back helm release eventstreams-internal/main to revision 3 on eqiad and codfw clusters because it's pending-upgrade since Mon Mar 21 21:36:56 2022 / Mon Mar 21 16:05:54 2022
  • 11:42 jayme@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:42 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:40 jayme: rolling back helm release tegola-vector-tiles/main to revision 11 on staging-eqiad because it's pending-upgrade since Mon Jun 27 09:45:56 2022
  • 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 11:00 moritzm: installing intel-microcode security updates
  • 10:59 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 10:59 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 10:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 10:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 10:32 moritzm: draining ganeti2010 T311686
  • 10:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 09:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:44 moritzm: installing 5.10.120-1~bpo10+1 kernels on buster hosts running Linux 5.10
  • 09:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8599f39: Declare mediawiki.editgrowthconfig schema (T312148) (duration: 03m 37s)
  • 09:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:38 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:37 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:35 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:33 marostegui: dbmaint s3@eqiad T312285
  • 09:33 marostegui: dbmaint s7@eqiad T312285
  • 09:33 marostegui: dbmaint s2@eqiad T312285
  • 09:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dbstore1007.eqiad.wmnet
  • 09:31 marostegui: dbmaint s6@eqiad T312285
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2161 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P30940 and previous config saved to /var/cache/conftool/dbconfig/20220707-092424-marostegui.json
  • 09:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dbstore1007.eqiad.wmnet
  • 09:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dbstore1005.eqiad.wmnet
  • 09:17 moritzm: draining ganeti2009 T311686
  • 09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dbstore1005.eqiad.wmnet
  • 09:11 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dbstore1003.eqiad.wmnet
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type back to plain
  • 09:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type back to plain
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2074', diff saved to https://phabricator.wikimedia.org/P30938 and previous config saved to /var/cache/conftool/dbconfig/20220707-090700-marostegui.json
  • 09:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dbstore1003.eqiad.wmnet
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2074.codfw.wmnet
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 08:47 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2074.codfw.wmnet
  • 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:08 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.19 refs T308072
  • 07:31 marostegui: dbmaint s3@eqiad T312286
  • 07:29 marostegui: dbmaint s7@eqiad T312286
  • 07:29 marostegui: dbmaint s2@eqiad T312286
  • 07:28 marostegui: dbmaint s6@eqiad T312286
  • 07:27 apergos: UTC morning backport and config training window closed
  • 07:23 marostegui: dbmaint s3@eqiad T312287
  • 07:20 marostegui: dbmaint s6@eqiad T312287
  • 07:19 marostegui: dbmaint s7@eqiad T312287
  • 07:19 marostegui: dbmaint s2@eqiad T312287
  • 07:14 kartik@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/ContentTranslation/modules/mw.cx.MachineTranslationManager.js: Backport: Update MT label for Flores (T311411) (duration: 03m 20s)
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:07 kartik@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/ContentTranslation/modules/mw.cx.MachineTranslationManager.js: Backport: Update MT label for Flores (T311411) (duration: 03m 41s)
  • 07:07 moritzm: drain ganeti1020 T308331
  • 07:07 marostegui: dbmaint s3@eqiad T312288
  • 07:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:03 marostegui: dbmaint s6@eqiad T312288
  • 07:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:00 marostegui: dbmaint s2@eqiad T312288
  • 06:56 marostegui: dbmaint s7@eqiad T312288
  • 06:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 06:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1160 T311611', diff saved to https://phabricator.wikimedia.org/P30937 and previous config saved to /var/cache/conftool/dbconfig/20220707-060743-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 primary and set section read-write T311611', diff saved to https://phabricator.wikimedia.org/P30936 and previous config saved to /var/cache/conftool/dbconfig/20220707-060112-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T311611', diff saved to https://phabricator.wikimedia.org/P30935 and previous config saved to /var/cache/conftool/dbconfig/20220707-060037-ladsgroup.json
  • 06:00 Amir1: Starting s4 eqiad failover from db1160 to db1138 - T311611
  • 05:35 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T311611', diff saved to https://phabricator.wikimedia.org/P30933 and previous config saved to /var/cache/conftool/dbconfig/20220707-051406-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 31 hosts with reason: Primary switchover s4 T311611
  • 05:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 31 hosts with reason: Primary switchover s4 T311611
  • 01:09 mutante: gitlab1004 - systemctl reset-failed, clear icinga alerts about rsync to decom'ed machine
  • 00:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:25 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab1001.wikimedia.org
  • 00:25 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)

2022-07-06

  • 23:50 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@5082f17]: increase subgraph_mapping_weekly executor memory (duration: 02m 05s)
  • 23:48 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@5082f17]: increase subgraph_mapping_weekly executor memory
  • 23:30 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 23:25 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab1001.wikimedia.org
  • 23:00 mutante: gitlab1004 - rm /lib/systemd/system/rsync-config-backup-gitlab1001* T307142
  • 22:52 mutante: etherpad - deleted 2 pads that had leaked information
  • 22:52 ebernhardson: restart airflow-webserver and airflow-scheduler for plugins update on an-airflow1001
  • 22:37 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@debd402]: airflow dags to generate subgraph and query mapping along with their metrics (duration: 02m 01s)
  • 22:35 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@debd402]: airflow dags to generate subgraph and query mapping along with their metrics
  • 21:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1005.wikimedia.org with OS bullseye
  • 21:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 21:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 21:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 21:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1002.wikimedia.org with OS bullseye
  • 20:59 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 20:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.wikimedia.org with OS bullseye
  • 20:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
  • 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bullseye
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 20:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 20:35 cjming: end of UTC late backport window
  • 20:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable sticky header edit A/B test for pilot wikis excluding idwiki/viwiki (T311144) (duration: 03m 25s)
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:54 bd808@mwmaint1002: Testing statshbot following deploy of gerrit:809732. This should be logged in SAL, but stashbot should not say that was done on irc.
  • 19:13 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 19:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:48 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:48 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:47 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:47 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:45 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 18:45 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
  • 18:02 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - T309343
  • 17:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:55 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudcephmon1002.eqiad.wmnet with reason: Moving racks
  • 17:55 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudcephmon1002.eqiad.wmnet with reason: Moving racks
  • 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:06 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 38s)
  • 17:06 inflatador: bking@cloudelastic1006 "restarting elastic services in preparation for cloudelastic reimage T309343"
  • 16:07 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl1002.eqiad.wmnet
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1002.eqiad.wmnet on all recursors
  • 15:57 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1002.eqiad.wmnet on all recursors
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:53 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1002.eqiad.wmnet
  • 15:51 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl1001.eqiad.wmnet
  • 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
  • 15:41 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
  • 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:37 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1001.eqiad.wmnet
  • 15:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:09 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 41s)
  • 15:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:05 moritzm: installing intel-microcode security updates
  • 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:00 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 28s)
  • 14:56 cmjohnson1: moving switch ports cloudcephosd1021 from cloudsw1-c to cloudsw2-c T310546
  • 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:53 akosiaris: reboot poolcounter1005 for kernel upgrades
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:49 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 33s)
  • 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore1009.wikimedia.org
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:34 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:32 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 14:32 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 14:30 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 14:30 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 14:27 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 14:26 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 14:22 akosiaris: depool eqiad kartotherian T305845
  • 14:22 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 14:17 akosiaris: pool codfw for kartotherian T305845
  • 14:16 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudstore1009.wikimedia.org
  • 14:15 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore1008.wikimedia.org
  • 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:05 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudstore1008.wikimedia.org
  • 13:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:54 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.addnode (exit_code=97) for new host ganeti2024.codfw.wmnet to cluster codfw and group A
  • 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add a new Eventgate stream for revision-score events (T301878) (duration: 03m 46s)
  • 13:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2024.codfw.wmnet to cluster codfw and group A
  • 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 13:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 13:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2039.codfw.wmnet
  • 13:30 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd1003.eqiad.wmnet
  • 13:28 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2039.codfw.wmnet
  • 13:28 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
  • 13:28 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1039.eqiad.wmnet
  • 13:20 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
  • 13:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2037.codfw.wmnet
  • 13:19 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1039.eqiad.wmnet
  • 13:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1038.eqiad.wmnet
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132 (T311106)', diff saved to https://phabricator.wikimedia.org/P30930 and previous config saved to /var/cache/conftool/dbconfig/20220706-131715-ladsgroup.json
  • 13:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1038.eqiad.wmnet
  • 13:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2037.codfw.wmnet
  • 13:09 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1037.eqiad.wmnet
  • 13:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2036.codfw.wmnet
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudstore1009.wikimedia.org
  • 13:04 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudstore1009.wikimedia.org
  • 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudstore1008.wikimedia.org
  • 13:03 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudstore1008.wikimedia.org
  • 13:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1037.eqiad.wmnet
  • 13:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2036.codfw.wmnet
  • 12:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
  • 12:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1036.eqiad.wmnet
  • 12:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore1008.wikimedia.org
  • 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:49 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1036.eqiad.wmnet
  • 12:49 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
  • 12:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:43 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudstore1008.wikimedia.org
  • 12:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd1003.eqiad.wmnet on all recursors
  • 12:41 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd1003.eqiad.wmnet on all recursors
  • 12:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:40 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 12:28 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 12:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2034.codfw.wmnet
  • 12:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1035.eqiad.wmnet
  • 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1035.eqiad.wmnet
  • 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2034.codfw.wmnet
  • 12:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1033.eqiad.wmnet
  • 12:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2033.codfw.wmnet
  • 11:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1033.eqiad.wmnet
  • 11:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2033.codfw.wmnet
  • 11:52 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2032.codfw.wmnet
  • 11:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1032.eqiad.wmnet
  • 11:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1032.eqiad.wmnet
  • 11:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2032.codfw.wmnet
  • 11:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2031.codfw.wmnet
  • 11:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1031.eqiad.wmnet
  • 11:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1031.eqiad.wmnet
  • 11:28 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2031.codfw.wmnet
  • 11:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1030.eqiad.wmnet
  • 11:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2030.codfw.wmnet
  • 11:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1030.eqiad.wmnet
  • 11:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2030.codfw.wmnet
  • 11:07 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1029.eqiad.wmnet
  • 11:07 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2029.codfw.wmnet
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30927 and previous config saved to /var/cache/conftool/dbconfig/20220706-110658-root.json
  • 10:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1029.eqiad.wmnet
  • 10:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2029.codfw.wmnet
  • 10:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1028.eqiad.wmnet
  • 10:54 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 10:54 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd1003.eqiad.wmnet
  • 10:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
  • 10:52 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd1002.eqiad.wmnet
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30925 and previous config saved to /var/cache/conftool/dbconfig/20220706-105154-root.json
  • 10:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1028.eqiad.wmnet
  • 10:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
  • 10:42 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd1002.eqiad.wmnet on all recursors
  • 10:42 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd1002.eqiad.wmnet on all recursors
  • 10:42 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet
  • 10:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1009.eqiad.wmnet
  • 10:37 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 10:37 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd1002.eqiad.wmnet
  • 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd1001.eqiad.wmnet
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30923 and previous config saved to /var/cache/conftool/dbconfig/20220706-103650-root.json
  • 10:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet
  • 10:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1009.eqiad.wmnet
  • 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd1001.eqiad.wmnet on all recursors
  • 10:27 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd1001.eqiad.wmnet on all recursors
  • 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:22 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 10:22 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd1001.eqiad.wmnet
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30921 and previous config saved to /var/cache/conftool/dbconfig/20220706-102146-root.json
  • 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:19 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2024.codfw.wmnet
  • 10:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30920 and previous config saved to /var/cache/conftool/dbconfig/20220706-100642-root.json
  • 10:02 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 09:59 volans: restarted wikibugs
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30919 and previous config saved to /var/cache/conftool/dbconfig/20220706-095138-root.json
  • 09:50 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30918 and previous config saved to /var/cache/conftool/dbconfig/20220706-093752-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30917 and previous config saved to /var/cache/conftool/dbconfig/20220706-093741-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30916 and previous config saved to /var/cache/conftool/dbconfig/20220706-093733-root.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30915 and previous config saved to /var/cache/conftool/dbconfig/20220706-093634-root.json
  • 09:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:23 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2024.codfw.wmnet
  • 09:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30914 and previous config saved to /var/cache/conftool/dbconfig/20220706-092248-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30913 and previous config saved to /var/cache/conftool/dbconfig/20220706-092237-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30912 and previous config saved to /var/cache/conftool/dbconfig/20220706-092229-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30911 and previous config saved to /var/cache/conftool/dbconfig/20220706-092130-root.json
  • 09:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30908 and previous config saved to /var/cache/conftool/dbconfig/20220706-091717-root.json
  • 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:15 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1039.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:14 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1039.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:14 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1038.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:13 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1038.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:13 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1037.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:11 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1037.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:11 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1036.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:10 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1036.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:10 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1035.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 09:09 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1035.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30907 and previous config saved to /var/cache/conftool/dbconfig/20220706-090744-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30906 and previous config saved to /var/cache/conftool/dbconfig/20220706-090731-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30905 and previous config saved to /var/cache/conftool/dbconfig/20220706-090725-root.json
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2039.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:02 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2039.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:02 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2038.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:01 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2038.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:01 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2037.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:00 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2037.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 09:00 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2036.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:58 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2036.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:55 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2034.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:54 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2034.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:54 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2033.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:53 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2033.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:53 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2032.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30904 and previous config saved to /var/cache/conftool/dbconfig/20220706-085240-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30903 and previous config saved to /var/cache/conftool/dbconfig/20220706-085227-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30902 and previous config saved to /var/cache/conftool/dbconfig/20220706-085221-root.json
  • 08:51 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2032.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:51 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2031.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:50 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2031.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:50 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2030.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:48 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2030.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:48 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2029.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:47 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2029.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:47 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2028.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:46 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2028.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:43 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1033.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:41 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1033.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:41 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1032.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:40 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1032.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:40 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1031.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:39 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1031.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:39 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1030.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:37 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1030.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:37 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1029.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30901 and previous config saved to /var/cache/conftool/dbconfig/20220706-083736-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30900 and previous config saved to /var/cache/conftool/dbconfig/20220706-083723-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30899 and previous config saved to /var/cache/conftool/dbconfig/20220706-083718-root.json
  • 08:36 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1029.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
  • 08:26 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1033.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30898 and previous config saved to /var/cache/conftool/dbconfig/20220706-082603-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30897 and previous config saved to /var/cache/conftool/dbconfig/20220706-082540-ladsgroup.json
  • 08:25 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1033.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:25 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1032.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:23 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1032.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30896 and previous config saved to /var/cache/conftool/dbconfig/20220706-082232-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30895 and previous config saved to /var/cache/conftool/dbconfig/20220706-082219-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30894 and previous config saved to /var/cache/conftool/dbconfig/20220706-082214-root.json
  • 08:21 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1031.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:20 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1031.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2024.codfw.wmnet with OS bullseye
  • 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1030.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:14 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1030.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:12 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.19 refs T308072 (duration: 03m 39s)
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30893 and previous config saved to /var/cache/conftool/dbconfig/20220706-081059-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30892 and previous config saved to /var/cache/conftool/dbconfig/20220706-081036-ladsgroup.json
  • 08:09 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1029.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:08 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.19 refs T308072
  • 08:07 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1029.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30891 and previous config saved to /var/cache/conftool/dbconfig/20220706-080728-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30890 and previous config saved to /var/cache/conftool/dbconfig/20220706-080715-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30889 and previous config saved to /var/cache/conftool/dbconfig/20220706-080710-root.json
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2024.codfw.wmnet with reason: host reimage
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1028.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 08:01 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1028.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
  • 07:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2024.codfw.wmnet with reason: host reimage
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30888 and previous config saved to /var/cache/conftool/dbconfig/20220706-075555-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30887 and previous config saved to /var/cache/conftool/dbconfig/20220706-075532-ladsgroup.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30886 and previous config saved to /var/cache/conftool/dbconfig/20220706-075224-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30885 and previous config saved to /var/cache/conftool/dbconfig/20220706-075211-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30884 and previous config saved to /var/cache/conftool/dbconfig/20220706-075206-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P30883 and previous config saved to /var/cache/conftool/dbconfig/20220706-074721-root.json
  • 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2024.codfw.wmnet with OS bullseye
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30882 and previous config saved to /var/cache/conftool/dbconfig/20220706-074051-ladsgroup.json
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: Test done (T311106)', diff saved to https://phabricator.wikimedia.org/P30881 and previous config saved to /var/cache/conftool/dbconfig/20220706-074028-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1135, if anything breaks, it's marostegui's fault (T311106)', diff saved to https://phabricator.wikimedia.org/P30880 and previous config saved to /var/cache/conftool/dbconfig/20220706-073052-ladsgroup.json
  • 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2024.codfw.wmnet with reason: Remove node for reimage
  • 07:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2024.codfw.wmnet with reason: Remove node for reimage
  • 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:20 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/dt.init.less: Backport: Revert "Hide the lede section on mobile when DiscussionTools is enabled" (T312177) (duration: 03m 37s)
  • 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30879 and previous config saved to /var/cache/conftool/dbconfig/20220706-071157-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30878 and previous config saved to /var/cache/conftool/dbconfig/20220706-070835-ladsgroup.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30876 and previous config saved to /var/cache/conftool/dbconfig/20220706-065143-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30875 and previous config saved to /var/cache/conftool/dbconfig/20220706-063639-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30874 and previous config saved to /var/cache/conftool/dbconfig/20220706-062135-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30873 and previous config saved to /var/cache/conftool/dbconfig/20220706-060631-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30872 and previous config saved to /var/cache/conftool/dbconfig/20220706-055127-root.json
  • 05:48 marostegui: dbmaint x1@eqiad T312162
  • 05:48 marostegui: dbmaint s3@eqiad T312162
  • 05:46 marostegui: dbmaint s3@eqiad T312161
  • 05:45 marostegui: dbmaint x1@eqiad T312161
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30871 and previous config saved to /var/cache/conftool/dbconfig/20220706-053623-root.json
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: codfw s7 sanitarium master switch
  • 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: codfw s7 sanitarium master switch
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30870 and previous config saved to /var/cache/conftool/dbconfig/20220706-052119-root.json
  • 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: codfw s6 sanitarium master switch
  • 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: codfw s6 sanitarium master switch
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2159 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P30869 and previous config saved to /var/cache/conftool/dbconfig/20220706-051046-marostegui.json
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30868 and previous config saved to /var/cache/conftool/dbconfig/20220706-050615-root.json
  • 04:18 tstarling@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/AbuseFilter: T310662 deployment with possible post-send error spike due to ServiceWiring/FilterProfiler interdependency (duration: 03m 33s)
  • 04:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:34 tstarling@deploy1002: Finished scap: WRStats core prereq T310662 g811407 (duration: 17m 20s)
  • 03:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:17 tstarling@deploy1002: Started scap: WRStats core prereq T310662 g811407
  • 02:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:30 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T310662 g 811394 harmless prerequisite (duration: 03m 39s)
  • 02:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:28 mutante: gitlab1004 - rm /lib/systemd/system/rsync-config-backup-gitlab2001.wikimedia.org.*
  • 01:21 mutante: gitlab1004 rm /lib/systemd/system/rsync-data-backup-gitlab2001.wikimedia.org.* ; systemctl reset-failed (T274463, T307142) - fix icinga alert after gitlab2001 was decom'ed, we didn't have puppet remove the timer/service

2022-07-05

  • 23:30 ebernhardson: start restore of commonswiki_file from thanos-swift to cloudelastic
  • 23:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - T309648
  • 22:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - ryankemper@cumin1001 - T309648
  • 22:28 ryankemper: T309648 Manually restarting `cloudelastic1006` before proceeding to a normal rolling restart of cloudelastic
  • 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:55 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable title above tabs everywhere (T311773) (duration: 03m 23s)
  • 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:35 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: cirrus: Disable commonswiki writes to cloudelastic (T309648) (duration: 03m 42s)
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:27 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: job queue: Squelch errors related to unwritable cloudelastic (T309648) (duration: 03m 37s)
  • 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:19 ebernhardson@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Backport: job queue: Squelch errors related to unwritable cloudelastic (T309648) (duration: 03m 43s)
  • 20:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2174.codfw.wmnet with OS bullseye
  • 20:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2174.codfw.wmnet with reason: host reimage
  • 20:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2174.codfw.wmnet with reason: host reimage
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2173.codfw.wmnet with OS bullseye
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: cirrus: Disable commonswiki writes to cloudelastic (T309648) (duration: 03m 23s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2174.codfw.wmnet with OS bullseye
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 66c9730: QuickSurveys: Increase coverage of research-incentive survey (T311015) (duration: 03m 28s)
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2173.codfw.wmnet with reason: host reimage
  • 20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2171.codfw.wmnet with OS bullseye
  • 20:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2173.codfw.wmnet with reason: host reimage
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b1c2171: GrowthExperiments: End mailing list campaign on eswiki (T307985) (duration: 03m 39s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 19:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2173.codfw.wmnet with OS bullseye
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2172.codfw.wmnet with OS bullseye
  • 19:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
  • 19:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2172.codfw.wmnet with OS bullseye
  • 18:53 papaul: power down moss-be2002 for NVMe installation
  • 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab2001.wikimedia.org
  • 18:52 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db2171.codfw.wmnet with OS bullseye
  • 18:44 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:40 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.wikimedia.org
  • 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab2001.codfw.wmnet
  • 18:39 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2170.codfw.wmnet with OS bullseye
  • 18:36 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:32 papaul: power down moss-be2001 for NVMe installation
  • 18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 18:32 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab2001.codfw.wmnet
  • 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 18:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
  • 18:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
  • 18:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bullseye
  • 18:01 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2174
  • 18:01 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2174
  • 18:00 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2173
  • 18:00 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2173
  • 17:59 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2172
  • 17:59 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2172
  • 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2171
  • 17:57 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2171
  • 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2170
  • 17:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2170
  • 17:54 mutante: disabling puppet on gitlab* - debugging gerrit:811276
  • 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye
  • 17:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2170.codfw.wmnet with OS bullseye
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2174.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:33 moritzm: installing haproxy security updates on stretch
  • 17:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2174.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2173.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2172.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bullseye
  • 16:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2169.codfw.wmnet with OS bullseye
  • 16:44 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
  • 16:34 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2173.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2172.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2171.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2170.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2169.codfw.wmnet with OS bullseye
  • 16:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2164.codfw.wmnet with OS bullseye
  • 15:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2171.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2170.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2164.codfw.wmnet with reason: host reimage
  • 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2169.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye
  • 15:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2164.codfw.wmnet with OS bullseye
  • 15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2169.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:09 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2169
  • 15:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2169
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db2169
  • 15:05 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db2169
  • 15:05 moritzm: installing firejail updates on stretch
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2164.codfw.wmnet with OS bullseye
  • 15:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:00 moritzm: draining ganeti2024 for eventual reimage T311686
  • 14:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2164.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD
  • 14:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type to DRBD
  • 14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:22 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
  • 14:22 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 14:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2164.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:33 urbanecm: UTC afternoon B&C window done
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:26 urbanecm@deploy1002: Synchronized w/static.php: 300ef4a: static.php: Update call to deprecated IContextSource::getStats (duration: 03m 41s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:15 urbanecm@deploy1002: Synchronized wmf-config/: 1287b96: Drop deprecated feature flags (T310684) (duration: 03m 32s)
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 891057f: Drop dependent feature flags (T310684) (duration: 03m 37s)
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 12:42 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30861 and previous config saved to /var/cache/conftool/dbconfig/20220705-124101-ladsgroup.json
  • 12:37 btullis@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 12:36 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 12:31 moritzm: draining ganeti2023 for eventual reimage T311686
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'T311106', diff saved to https://phabricator.wikimedia.org/P30859 and previous config saved to /var/cache/conftool/dbconfig/20220705-122941-ladsgroup.json
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2158 to dbctl T311493', diff saved to https://phabricator.wikimedia.org/P30848 and previous config saved to /var/cache/conftool/dbconfig/20220705-110432-marostegui.json
  • 11:01 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:30 _joe_: running benchmarks in codfw for php7.2/7.4 comparison.
  • 10:29 moritzm: sudo gnt-cluster upgrade --to 3.0 for ganeti/codfw T311686
  • 10:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001
  • 10:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.5.0 - volans@cumin1001
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:00 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.19 refs T308072
  • 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:36 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.19 refs T308072 (duration: 34m 21s)
  • 09:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1002
  • 09:33 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1002
  • 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:02 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.19 refs T308072
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0)
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces
  • 08:43 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:30 moritzm: uploaded 7.4.30-3+0~20220627.69+debian10~1.gbpf2b381+wmf1+buster3 to component/php74 (pulling php-common with the socket helper) T311386
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30835 and previous config saved to /var/cache/conftool/dbconfig/20220705-082415-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30834 and previous config saved to /var/cache/conftool/dbconfig/20220705-082058-root.json
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30833 and previous config saved to /var/cache/conftool/dbconfig/20220705-080911-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30832 and previous config saved to /var/cache/conftool/dbconfig/20220705-080554-root.json
  • 07:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 89aef54: MentorDashboard: enable the Vue version of the dashboard in beta (T300532) (duration: 03m 18s)
  • 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30831 and previous config saved to /var/cache/conftool/dbconfig/20220705-075408-root.json
  • 07:54 urbanecm@deploy1002: Synchronized logos/config.yaml: c8c092a: trwiki: Change old and new vector logos for 500k articles (T311946; 3/3) (duration: 03m 34s)
  • 07:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30830 and previous config saved to /var/cache/conftool/dbconfig/20220705-075050-root.json
  • 07:50 urbanecm@deploy1002: Synchronized wmf-config/: c8c092a: trwiki: Change old and new vector logos for 500k articles (T311946; 2/3) (duration: 03m 36s)
  • 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:46 urbanecm@deploy1002: Synchronized static/: c8c092a: trwiki: Change old and new vector logos for 500k articles (T311946; 1/3) (duration: 03m 17s)
  • 07:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30829 and previous config saved to /var/cache/conftool/dbconfig/20220705-073904-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30828 and previous config saved to /var/cache/conftool/dbconfig/20220705-073546-root.json
  • 07:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: ce64780: SuggestedEdits: Adjust thumbnailSource logic (T311789) (duration: 03m 32s)
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30827 and previous config saved to /var/cache/conftool/dbconfig/20220705-072400-root.json
  • 07:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:21 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php: d5050b7: Retrieve pages-with-suggestion via Elastic scroll directly (T311476) (duration: 03m 32s)
  • 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30826 and previous config saved to /var/cache/conftool/dbconfig/20220705-072043-root.json
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:17 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/CentralNotice/includes/specials/CentralNotice.php: 414b7b8: Only add tabs to special pages (T311944) (duration: 03m 30s)
  • 07:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 14df0e2: zh(wikiversity|wiktionary): Disable local upload (T312012) (duration: 03m 47s)
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30824 and previous config saved to /var/cache/conftool/dbconfig/20220705-070856-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30823 and previous config saved to /var/cache/conftool/dbconfig/20220705-070539-root.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: codfw s3 sanitarium master switch
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Decommission db2073 T311837', diff saved to https://phabricator.wikimedia.org/P30822 and previous config saved to /var/cache/conftool/dbconfig/20220705-070019-marostegui.json
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2073.codfw.wmnet
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30821 and previous config saved to /var/cache/conftool/dbconfig/20220705-065352-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30820 and previous config saved to /var/cache/conftool/dbconfig/20220705-065035-root.json
  • 06:50 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2073.codfw.wmnet
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30819 and previous config saved to /var/cache/conftool/dbconfig/20220705-063848-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30818 and previous config saved to /var/cache/conftool/dbconfig/20220705-063531-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30817 and previous config saved to /var/cache/conftool/dbconfig/20220705-063402-root.json
  • 06:09 marostegui: dbmaint s6@eqiad T298557
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 T311522', diff saved to https://phabricator.wikimedia.org/P30816 and previous config saved to /var/cache/conftool/dbconfig/20220705-060526-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T311522', diff saved to https://phabricator.wikimedia.org/P30814 and previous config saved to /var/cache/conftool/dbconfig/20220705-060111-marostegui.json
  • 06:00 marostegui: Starting s6 eqiad failover from db1131 to db1173 - T311522
  • 05:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 05:58 TimStarling: deploying multi-DC support g 801621, manual puppet run on cp1080
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 T311522', diff saved to https://phabricator.wikimedia.org/P30813 and previous config saved to /var/cache/conftool/dbconfig/20220705-052219-marostegui.json
  • 05:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T311522
  • 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s6 T311522
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-07-04

  • 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org
  • 19:53 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 19:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2004-dev.wikimedia.org
  • 19:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.wikimedia.org
  • 19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 8 hosts with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 8 hosts with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30811 and previous config saved to /var/cache/conftool/dbconfig/20220704-192955-ladsgroup.json
  • 19:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2003-dev.wikimedia.org
  • 19:27 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2004-dev.wikimedia.org
  • 19:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
  • 19:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 19:17 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2003-dev.wikimedia.org
  • 19:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 19:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2001-dev.wikimedia.org
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P30810 and previous config saved to /var/cache/conftool/dbconfig/20220704-191450-ladsgroup.json
  • 19:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1003.wikimedia.org
  • 19:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.wikimedia.org
  • 19:01 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2001-dev.wikimedia.org
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P30809 and previous config saved to /var/cache/conftool/dbconfig/20220704-185945-ladsgroup.json
  • 18:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1004.wikimedia.org
  • 18:53 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org
  • 18:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.wikimedia.org
  • 18:52 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
  • 18:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1003.wikimedia.org
  • 18:51 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1003.wikimedia.org
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30808 and previous config saved to /var/cache/conftool/dbconfig/20220704-184440-ladsgroup.json
  • 18:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.wikimedia.org
  • 18:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1003.wikimedia.org
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30807 and previous config saved to /var/cache/conftool/dbconfig/20220704-184231-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30806 and previous config saved to /var/cache/conftool/dbconfig/20220704-184211-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P30805 and previous config saved to /var/cache/conftool/dbconfig/20220704-182706-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P30804 and previous config saved to /var/cache/conftool/dbconfig/20220704-181200-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30803 and previous config saved to /var/cache/conftool/dbconfig/20220704-175655-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T312027)', diff saved to https://phabricator.wikimedia.org/P30802 and previous config saved to /var/cache/conftool/dbconfig/20220704-175446-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30801 and previous config saved to /var/cache/conftool/dbconfig/20220704-175425-ladsgroup.json
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P30800 and previous config saved to /var/cache/conftool/dbconfig/20220704-173920-ladsgroup.json
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P30799 and previous config saved to /var/cache/conftool/dbconfig/20220704-172415-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30798 and previous config saved to /var/cache/conftool/dbconfig/20220704-170910-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30797 and previous config saved to /var/cache/conftool/dbconfig/20220704-170800-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 17:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30796 and previous config saved to /var/cache/conftool/dbconfig/20220704-170740-ladsgroup.json
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P30795 and previous config saved to /var/cache/conftool/dbconfig/20220704-165235-ladsgroup.json
  • 16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P30793 and previous config saved to /var/cache/conftool/dbconfig/20220704-163730-ladsgroup.json
  • 16:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30792 and previous config saved to /var/cache/conftool/dbconfig/20220704-162225-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T312027)', diff saved to https://phabricator.wikimedia.org/P30791 and previous config saved to /var/cache/conftool/dbconfig/20220704-162015-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30790 and previous config saved to /var/cache/conftool/dbconfig/20220704-161944-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P30789 and previous config saved to /var/cache/conftool/dbconfig/20220704-161817-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P30788 and previous config saved to /var/cache/conftool/dbconfig/20220704-160439-ladsgroup.json
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P30787 and previous config saved to /var/cache/conftool/dbconfig/20220704-160314-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P30786 and previous config saved to /var/cache/conftool/dbconfig/20220704-154933-ladsgroup.json
  • 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P30785 and previous config saved to /var/cache/conftool/dbconfig/20220704-154810-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30784 and previous config saved to /var/cache/conftool/dbconfig/20220704-153428-ladsgroup.json
  • 15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P30783 and previous config saved to /var/cache/conftool/dbconfig/20220704-153306-ladsgroup.json
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T312027)', diff saved to https://phabricator.wikimedia.org/P30782 and previous config saved to /var/cache/conftool/dbconfig/20220704-153218-ladsgroup.json
  • 15:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T305300)', diff saved to https://phabricator.wikimedia.org/P30781 and previous config saved to /var/cache/conftool/dbconfig/20220704-152931-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 14:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 14:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Exempt WMCS ranges from globalblocking everywhere (T307648) (duration: 03m 26s)
  • 14:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 14:20 oblivian@deploy1002: Synchronized README: testing new php restart script (duration: 03m 23s)
  • 14:19 elukey: roll restart of thanos-fe's proxy to pick up a new account - T311628
  • 14:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 14:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 14:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 14:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 14:10 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set GlobalBlockingAllowedRanges for testwiki (T307648) (duration: 03m 39s)
  • 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:05 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 14:05 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 13:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 13:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 13:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 13:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 13:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 13:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 13:11 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 13:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 12:38 jynus: running alter table on dbbackups db T283017
  • 12:27 _joe_: updated etcdmirror to 0.0.8 everywhere
  • 12:17 moritzm: installing 4.9.320 on stretch hosts
  • 11:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:55 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GlobalBlocking/includes/GlobalBlocking.php: Backport: Add statsd metric collection on db calls (T307648) (duration: 03m 26s)
  • 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:50 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addimage/AddImageArticleTarget.js: Backport: AddImageArticleTarget: Update to new mediaClass/mediaTag format (T311916) (duration: 03m 33s)
  • 11:36 marostegui@cumin2002: dbctl commit (dc=all): 'Add db2156 to s3 T311493', diff saved to https://phabricator.wikimedia.org/P30774 and previous config saved to /var/cache/conftool/dbconfig/20220704-113640-marostegui.json
  • 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:54 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.18/includes: Backport: Revert "Revert "RecentChange: Straight join to actor table when needed"" (T311360) (duration: 03m 49s)
  • 10:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:25 _joe_: rollback etcdmirror to 0.0.6 on conf2005
  • 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:25 godog: silence etcd p a g e
  • 10:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:21 _joe_: restarting etcdmirror on conf2005
  • 10:21 moritzm: installing gnupg2 security updates
  • 10:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:17 _joe_: upgraded etcdmirror to 0.0.7 on conf2006, now going with the rest of codfw
  • 10:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:24 marostegui@cumin2002: dbctl commit (dc=all): 'Add db2157 to s5 T311493', diff saved to https://phabricator.wikimedia.org/P30758 and previous config saved to /var/cache/conftool/dbconfig/20220704-082406-marostegui.json
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MewOphaswongse out of all services on: 634 hosts
  • 08:07 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MewOphaswongse out of all services on: 634 hosts
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MewOphaswongse out of all services on: 1299 hosts
  • 08:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging MewOphaswongse out of all services on: 1299 hosts
  • 08:04 elukey: kill leftover processes of user `mewoph` on stat100x to allow puppet runs
  • 07:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
  • 07:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
  • 06:49 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2092.codfw.wmnet
  • 06:47 marostegui@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:43 marostegui@cumin2002: START - Cookbook sre.dns.netbox
  • 06:39 marostegui@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2092.codfw.wmnet
  • 06:34 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2091.codfw.wmnet
  • 06:32 marostegui@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:28 marostegui@cumin2002: START - Cookbook sre.dns.netbox
  • 06:24 marostegui@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2091.codfw.wmnet
  • 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: codfw s4 sanitarium master switch
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: codfw s4 sanitarium master switch

2022-07-03

  • 11:36 _joe_: temporarily raised replicas for shellbox to 24
  • 11:35 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply

2022-07-02

  • 05:36 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 05:36 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 05:24 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 05:23 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 05:21 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 05:20 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 05:11 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 05:11 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 04:49 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 04:49 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 04:48 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 04:48 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 03:59 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 03:59 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 03:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 03:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 03:56 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 03:56 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 02:49 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 02:49 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 01:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance

2022-07-01

  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30753 and previous config saved to /var/cache/conftool/dbconfig/20220701-235524-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P30752 and previous config saved to /var/cache/conftool/dbconfig/20220701-234019-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P30751 and previous config saved to /var/cache/conftool/dbconfig/20220701-232514-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30750 and previous config saved to /var/cache/conftool/dbconfig/20220701-231009-ladsgroup.json
  • 23:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1012.eqiad.wmnet with reason: host reimage
  • 22:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1012.eqiad.wmnet with reason: host reimage
  • 22:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1015.eqiad.wmnet with OS bullseye
  • 22:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1015.eqiad.wmnet with reason: host reimage
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T309311)', diff saved to https://phabricator.wikimedia.org/P30749 and previous config saved to /var/cache/conftool/dbconfig/20220701-221438-ladsgroup.json
  • 22:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1015.eqiad.wmnet with reason: host reimage
  • 22:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30748 and previous config saved to /var/cache/conftool/dbconfig/20220701-221418-ladsgroup.json
  • 22:12 mutante: restbase2018 - attempting power cycle via mgmt - /admin1-> racadm serveraction powercycle (T311890)
  • 22:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1014.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1013.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1008.eqiad.wmnet with OS bullseye
  • 22:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1010.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P30747 and previous config saved to /var/cache/conftool/dbconfig/20220701-215913-ladsgroup.json
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1009.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1011.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 21:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1012.eqiad.wmnet with OS bullseye
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1014.eqiad.wmnet with reason: host reimage
  • 21:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1010.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1008.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1013.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 21:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1008.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1013.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1010.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1014.eqiad.wmnet with reason: host reimage
  • 21:48 mutante: https://doc.wikimedia.org switched to doc1002 backend on buster T247653
  • 21:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stat1009.eqiad.wmnet with OS bullseye
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P30746 and previous config saved to /var/cache/conftool/dbconfig/20220701-214408-ladsgroup.json
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1015.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1010.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1008.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1013.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1009.eqiad.wmnet with OS bullseye
  • 21:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1012.eqiad.wmnet with OS bullseye
  • 21:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1014.eqiad.wmnet with OS bullseye
  • 21:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1006.eqiad.wmnet with OS bullseye
  • 21:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stat1009.eqiad.wmnet with reason: host reimage
  • 21:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stat1009.eqiad.wmnet with reason: host reimage
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30745 and previous config saved to /var/cache/conftool/dbconfig/20220701-212903-ladsgroup.json
  • 21:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1006.eqiad.wmnet with reason: host reimage
  • 21:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host stat1009.eqiad.wmnet with OS bullseye
  • 21:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1006.eqiad.wmnet with reason: host reimage
  • 21:09 mutante: https://doc.wikimedia.org - scheduled maintenance period - switching to buster backend doc1002 (T247653)
  • 21:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
  • 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T309311)', diff saved to https://phabricator.wikimedia.org/P30744 and previous config saved to /var/cache/conftool/dbconfig/20220701-203251-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30743 and previous config saved to /var/cache/conftool/dbconfig/20220701-203231-ladsgroup.json
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P30742 and previous config saved to /var/cache/conftool/dbconfig/20220701-201726-ladsgroup.json
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P30741 and previous config saved to /var/cache/conftool/dbconfig/20220701-200221-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30740 and previous config saved to /var/cache/conftool/dbconfig/20220701-194716-ladsgroup.json
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30739 and previous config saved to /var/cache/conftool/dbconfig/20220701-183504-ladsgroup.json
  • 18:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30738 and previous config saved to /var/cache/conftool/dbconfig/20220701-183444-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P30737 and previous config saved to /var/cache/conftool/dbconfig/20220701-181939-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P30736 and previous config saved to /var/cache/conftool/dbconfig/20220701-180434-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30735 and previous config saved to /var/cache/conftool/dbconfig/20220701-174929-ladsgroup.json
  • 17:47 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 17:47 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T309311)', diff saved to https://phabricator.wikimedia.org/P30734 and previous config saved to /var/cache/conftool/dbconfig/20220701-165407-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30733 and previous config saved to /var/cache/conftool/dbconfig/20220701-165347-ladsgroup.json
  • 16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P30732 and previous config saved to /var/cache/conftool/dbconfig/20220701-163842-ladsgroup.json
  • 16:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS bullseye
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P30731 and previous config saved to /var/cache/conftool/dbconfig/20220701-162337-ladsgroup.json
  • 16:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30730 and previous config saved to /var/cache/conftool/dbconfig/20220701-160831-ladsgroup.json
  • 15:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS bullseye
  • 15:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2167.codfw.wmnet with OS bullseye
  • 15:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2166.codfw.wmnet with OS bullseye
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 15:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
  • 15:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 15:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 15:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore[1008-1009]
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T309311)', diff saved to https://phabricator.wikimedia.org/P30729 and previous config saved to /var/cache/conftool/dbconfig/20220701-145937-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 14:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2166.codfw.wmnet with reason: host reimage
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2167.codfw.wmnet with OS bullseye
  • 14:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2166.codfw.wmnet with OS bullseye
  • 14:39 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudstore[1008-1009]
  • 14:05 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 14:04 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30728 and previous config saved to /var/cache/conftool/dbconfig/20220701-135831-ladsgroup.json
  • 13:50 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:50 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 07s)
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P30727 and previous config saved to /var/cache/conftool/dbconfig/20220701-134326-ladsgroup.json
  • 13:43 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:36 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:36 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P30726 and previous config saved to /var/cache/conftool/dbconfig/20220701-132821-ladsgroup.json
  • 13:23 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 13:23 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:19 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:19 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30725 and previous config saved to /var/cache/conftool/dbconfig/20220701-131316-ladsgroup.json
  • 13:12 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:12 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:08 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 13:08 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2155 to s4 T311493', diff saved to https://phabricator.wikimedia.org/P30724 and previous config saved to /var/cache/conftool/dbconfig/20220701-130106-marostegui.json
  • 12:38 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:38 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 12:37 moritzm: uploaded rsyslog 8.2102.0-2+deb11u1+wmf2 to component/rsyslog-k8s (backport of latest security fixes on top of the rsyslog with mmkubernetes plugin)
  • 12:09 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:09 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T309311)', diff saved to https://phabricator.wikimedia.org/P30723 and previous config saved to /var/cache/conftool/dbconfig/20220701-120657-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30722 and previous config saved to /var/cache/conftool/dbconfig/20220701-120636-ladsgroup.json
  • 12:02 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 12:02 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30721 and previous config saved to /var/cache/conftool/dbconfig/20220701-115414-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P30720 and previous config saved to /var/cache/conftool/dbconfig/20220701-115131-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P30719 and previous config saved to /var/cache/conftool/dbconfig/20220701-113909-ladsgroup.json
  • 11:38 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 11:38 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P30718 and previous config saved to /var/cache/conftool/dbconfig/20220701-113626-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P30717 and previous config saved to /var/cache/conftool/dbconfig/20220701-112404-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30716 and previous config saved to /var/cache/conftool/dbconfig/20220701-112121-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30715 and previous config saved to /var/cache/conftool/dbconfig/20220701-110859-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T309311)', diff saved to https://phabricator.wikimedia.org/P30714 and previous config saved to /var/cache/conftool/dbconfig/20220701-110204-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30713 and previous config saved to /var/cache/conftool/dbconfig/20220701-110117-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P30712 and previous config saved to /var/cache/conftool/dbconfig/20220701-104612-ladsgroup.json
  • 10:45 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 08s)
  • 10:45 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 10:44 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b3fe77c]: (no justification provided) (duration: 00m 09s)
  • 10:44 bmansurov@deploy1002: Started deploy [airflow-dags/research@b3fe77c]: (no justification provided)
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P30711 and previous config saved to /var/cache/conftool/dbconfig/20220701-103107-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T309311)', diff saved to https://phabricator.wikimedia.org/P30710 and previous config saved to /var/cache/conftool/dbconfig/20220701-102810-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30709 and previous config saved to /var/cache/conftool/dbconfig/20220701-101602-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T309311)', diff saved to https://phabricator.wikimedia.org/P30708 and previous config saved to /var/cache/conftool/dbconfig/20220701-094927-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:35 marostegui: Stop mysql on db2073 for cloning db2155
  • 07:47 mmandere: kubemaster2001, restart rsyslog
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2154 to s8 T311493', diff saved to https://phabricator.wikimedia.org/P30705 and previous config saved to /var/cache/conftool/dbconfig/20220701-074607-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2153 to s1 T311493', diff saved to https://phabricator.wikimedia.org/P30704 and previous config saved to /var/cache/conftool/dbconfig/20220701-073512-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2091 from dbctl T311803', diff saved to https://phabricator.wikimedia.org/P30703 and previous config saved to /var/cache/conftool/dbconfig/20220701-060000-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2092 from dbctl T311802', diff saved to https://phabricator.wikimedia.org/P30701 and previous config saved to /var/cache/conftool/dbconfig/20220701-054102-marostegui.json
  • 02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2165.codfw.wmnet with OS bullseye
  • 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
  • 02:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2165.codfw.wmnet with reason: host reimage
  • 02:06 krinkle@deploy1002: Synchronized wmf-config/: I60edfb0f60 (3/3) (duration: 03m 31s)
  • 02:01 krinkle@deploy1002: Synchronized multiversion/: I60edfb0f60 (2/3) (duration: 03m 34s)
  • 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2165.codfw.wmnet with OS bullseye
  • 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2163.codfw.wmnet with OS bullseye
  • 01:39 krinkle@deploy1002: Synchronized tests/: I60edfb0f60 (1/3) (duration: 03m 32s)
  • 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
  • 01:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2163.codfw.wmnet with reason: host reimage
  • 01:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:30 krinkle@deploy1002: Synchronized src/: I796f38 (3/3) (duration: 03m 24s)
  • 01:26 krinkle@deploy1002: Synchronized multiversion/: I796f38 (2/3) (duration: 03m 32s)
  • 01:23 krinkle@deploy1002: Synchronized tests/: I796f38 (1/3) (duration: 03m 41s)
  • 01:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2162.codfw.wmnet with OS bullseye
  • 01:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2163.codfw.wmnet with OS bullseye
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2162.codfw.wmnet with reason: host reimage
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2161.codfw.wmnet with OS bullseye
  • 00:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2162.codfw.wmnet with reason: host reimage
  • 00:53 ejegg: updated payments-wiki from ef53c82e to 78dee85e
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2168.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2167.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
  • 00:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2161.codfw.wmnet with reason: host reimage
  • 00:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2162.codfw.wmnet with OS bullseye
  • 00:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2168.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2167.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2166.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2165.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2161.codfw.wmnet with OS bullseye
  • 00:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2166.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2163.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2165.mgmt.codfw.wmnet with reboot policy FORCED

Archives

See Server Admin Log/Archives.