You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(godog: bounce prometheus@ops on prometheus5001)
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24355 and previous config saved to /var/cache/conftool/dbconfig/20220411-014316-ladsgroup.json)
Line 1: Line 1:
== 2022-04-11 ==
* 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24355 and previous config saved to /var/cache/conftool/dbconfig/20220411-014316-ladsgroup.json
* 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24354 and previous config saved to /var/cache/conftool/dbconfig/20220411-004826-ladsgroup.json
* 00:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 00:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24353 and previous config saved to /var/cache/conftool/dbconfig/20220411-004817-ladsgroup.json
* 00:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24352 and previous config saved to /var/cache/conftool/dbconfig/20220411-003312-ladsgroup.json
* 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24351 and previous config saved to /var/cache/conftool/dbconfig/20220411-001807-ladsgroup.json
* 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24350 and previous config saved to /var/cache/conftool/dbconfig/20220411-000302-ladsgroup.json
== 2022-04-10 ==
* 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24349 and previous config saved to /var/cache/conftool/dbconfig/20220410-231112-ladsgroup.json
* 23:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 23:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24348 and previous config saved to /var/cache/conftool/dbconfig/20220410-231104-ladsgroup.json
* 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24347 and previous config saved to /var/cache/conftool/dbconfig/20220410-225559-ladsgroup.json
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24346 and previous config saved to /var/cache/conftool/dbconfig/20220410-224053-ladsgroup.json
* 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24345 and previous config saved to /var/cache/conftool/dbconfig/20220410-222548-ladsgroup.json
* 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24344 and previous config saved to /var/cache/conftool/dbconfig/20220410-212042-ladsgroup.json
* 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24343 and previous config saved to /var/cache/conftool/dbconfig/20220410-212024-ladsgroup.json
* 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24342 and previous config saved to /var/cache/conftool/dbconfig/20220410-210519-ladsgroup.json
* 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24341 and previous config saved to /var/cache/conftool/dbconfig/20220410-205014-ladsgroup.json
* 20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24340 and previous config saved to /var/cache/conftool/dbconfig/20220410-203508-ladsgroup.json
* 19:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24339 and previous config saved to /var/cache/conftool/dbconfig/20220410-193900-ladsgroup.json
* 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
== 2022-04-09 ==
== 2022-04-09 ==
* 12:39 godog: bounce prometheus@ops on prometheus5001
* 12:39 godog: bounce prometheus@ops on prometheus5001

Revision as of 01:43, 11 April 2022

2022-04-11

  • 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24355 and previous config saved to /var/cache/conftool/dbconfig/20220411-014316-ladsgroup.json
  • 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24354 and previous config saved to /var/cache/conftool/dbconfig/20220411-004826-ladsgroup.json
  • 00:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 00:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24353 and previous config saved to /var/cache/conftool/dbconfig/20220411-004817-ladsgroup.json
  • 00:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24352 and previous config saved to /var/cache/conftool/dbconfig/20220411-003312-ladsgroup.json
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24351 and previous config saved to /var/cache/conftool/dbconfig/20220411-001807-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24350 and previous config saved to /var/cache/conftool/dbconfig/20220411-000302-ladsgroup.json

2022-04-10

  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24349 and previous config saved to /var/cache/conftool/dbconfig/20220410-231112-ladsgroup.json
  • 23:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24348 and previous config saved to /var/cache/conftool/dbconfig/20220410-231104-ladsgroup.json
  • 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24347 and previous config saved to /var/cache/conftool/dbconfig/20220410-225559-ladsgroup.json
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24346 and previous config saved to /var/cache/conftool/dbconfig/20220410-224053-ladsgroup.json
  • 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24345 and previous config saved to /var/cache/conftool/dbconfig/20220410-222548-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24344 and previous config saved to /var/cache/conftool/dbconfig/20220410-212042-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24343 and previous config saved to /var/cache/conftool/dbconfig/20220410-212024-ladsgroup.json
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24342 and previous config saved to /var/cache/conftool/dbconfig/20220410-210519-ladsgroup.json
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24341 and previous config saved to /var/cache/conftool/dbconfig/20220410-205014-ladsgroup.json
  • 20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24340 and previous config saved to /var/cache/conftool/dbconfig/20220410-203508-ladsgroup.json
  • 19:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24339 and previous config saved to /var/cache/conftool/dbconfig/20220410-193900-ladsgroup.json
  • 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance

2022-04-09

  • 12:39 godog: bounce prometheus@ops on prometheus5001
  • 12:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
  • 12:22 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
  • 12:22 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1002.eqiad.wmnet
  • 12:22 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
  • 12:20 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1002.eqiad.wmnet
  • 12:20 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
  • 03:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24337 and previous config saved to /var/cache/conftool/dbconfig/20220409-030854-ladsgroup.json
  • 02:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24336 and previous config saved to /var/cache/conftool/dbconfig/20220409-025349-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24335 and previous config saved to /var/cache/conftool/dbconfig/20220409-023843-ladsgroup.json
  • 02:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24334 and previous config saved to /var/cache/conftool/dbconfig/20220409-022338-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24333 and previous config saved to /var/cache/conftool/dbconfig/20220409-005351-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24332 and previous config saved to /var/cache/conftool/dbconfig/20220409-005338-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24331 and previous config saved to /var/cache/conftool/dbconfig/20220409-003832-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24330 and previous config saved to /var/cache/conftool/dbconfig/20220409-002327-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24329 and previous config saved to /var/cache/conftool/dbconfig/20220409-000822-ladsgroup.json

2022-04-08

  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24328 and previous config saved to /var/cache/conftool/dbconfig/20220408-225350-ladsgroup.json
  • 22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24327 and previous config saved to /var/cache/conftool/dbconfig/20220408-225342-ladsgroup.json
  • 22:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24326 and previous config saved to /var/cache/conftool/dbconfig/20220408-223837-ladsgroup.json
  • 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24325 and previous config saved to /var/cache/conftool/dbconfig/20220408-222332-ladsgroup.json
  • 22:09 mutante: gitlab - deleted runner-1008 (to replace it with a bullseye instance), recreated runner-1020 with same flavor as existing runners T297659
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24324 and previous config saved to /var/cache/conftool/dbconfig/20220408-220827-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24323 and previous config saved to /var/cache/conftool/dbconfig/20220408-204138-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24322 and previous config saved to /var/cache/conftool/dbconfig/20220408-204129-ladsgroup.json
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24321 and previous config saved to /var/cache/conftool/dbconfig/20220408-202624-ladsgroup.json
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24320 and previous config saved to /var/cache/conftool/dbconfig/20220408-201119-ladsgroup.json
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24319 and previous config saved to /var/cache/conftool/dbconfig/20220408-195614-ladsgroup.json
  • 18:38 mutante: gitlab1001 - giving myself gitlab admin rights via rake console, to be able to connect/disconnect runners T297659
  • 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24318 and previous config saved to /var/cache/conftool/dbconfig/20220408-183643-ladsgroup.json
  • 18:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24317 and previous config saved to /var/cache/conftool/dbconfig/20220408-183635-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24316 and previous config saved to /var/cache/conftool/dbconfig/20220408-182130-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24315 and previous config saved to /var/cache/conftool/dbconfig/20220408-180625-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24313 and previous config saved to /var/cache/conftool/dbconfig/20220408-175120-ladsgroup.json
  • 17:35 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:35 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:34 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:34 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24312 and previous config saved to /var/cache/conftool/dbconfig/20220408-162938-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24311 and previous config saved to /var/cache/conftool/dbconfig/20220408-162930-ladsgroup.json
  • 16:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24309 and previous config saved to /var/cache/conftool/dbconfig/20220408-155919-ladsgroup.json
  • 15:53 dancy: dancy@deploy1002: Testing mw container image build
  • 15:52 dancy@deploy1002: Started scap: (no justification provided)
  • 15:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:51 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24308 and previous config saved to /var/cache/conftool/dbconfig/20220408-154414-ladsgroup.json
  • away: re-enabled fundraising scheduled jobs
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24307 and previous config saved to /var/cache/conftool/dbconfig/20220408-143545-root.json
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24306 and previous config saved to /var/cache/conftool/dbconfig/20220408-142239-ladsgroup.json
  • 14:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24305 and previous config saved to /var/cache/conftool/dbconfig/20220408-142230-ladsgroup.json
  • 14:21 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again again again again; keeping queue below the p.age threshold while fr-tech work)
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24304 and previous config saved to /var/cache/conftool/dbconfig/20220408-142041-root.json
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24303 and previous config saved to /var/cache/conftool/dbconfig/20220408-140725-ladsgroup.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24302 and previous config saved to /var/cache/conftool/dbconfig/20220408-140536-root.json
  • 14:02 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1008.eqiad.wmnet with OS bullseye
  • 13:57 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2008.codfw.wmnet with OS bullseye
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24300 and previous config saved to /var/cache/conftool/dbconfig/20220408-135220-ladsgroup.json
  • 13:51 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1008.eqiad.wmnet with reason: host reimage
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P24299 and previous config saved to /var/cache/conftool/dbconfig/20220408-135032-root.json
  • 13:50 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again again again again; keeping queue below the p.age threshold while fr-tech work)
  • 13:47 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1008.eqiad.wmnet with reason: host reimage
  • 13:46 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2008.codfw.wmnet with reason: host reimage
  • 13:43 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2008.codfw.wmnet with reason: host reimage
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24298 and previous config saved to /var/cache/conftool/dbconfig/20220408-133715-ladsgroup.json
  • 13:37 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS bullseye
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P24297 and previous config saved to /var/cache/conftool/dbconfig/20220408-133528-root.json
  • 13:30 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2008.codfw.wmnet with OS bullseye
  • 13:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubemaster1001.eqiad.wmnet with reason: reimage
  • 13:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubemaster1001.eqiad.wmnet with reason: reimage
  • 13:20 mmandere: pool cp6001 with HAProxy as TLS termination layer - T290005
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P24296 and previous config saved to /var/cache/conftool/dbconfig/20220408-132024-root.json
  • 13:18 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again again again; keeping queue below the p.age threshold while fr-tech work)
  • 13:16 mmandere: pool cp6009 with HAProxy as TLS termination layer - T290005
  • 13:13 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS buster
  • 13:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS buster
  • 13:00 gmodena@deploy1002: Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 02m 11s)
  • 12:59 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again again)
  • 12:58 gmodena@deploy1002: Started deploy [airflow-dags/research@b029f10]: (no justification provided)
  • 12:57 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubemaster1002.eqiad.wmnet with reason: reimage
  • 12:57 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubemaster1002.eqiad.wmnet with reason: reimage
  • 12:54 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again)
  • 12:49 ejegg: disabled paypal IPN listener failmail
  • 12:44 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
  • 12:40 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
  • 12:33 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 12:29 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 12:22 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
  • 12:15 mmandere: depool cp6001 for reimage - T290005
  • 12:11 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS buster
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24295 and previous config saved to /var/cache/conftool/dbconfig/20220408-121138-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:45 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184', diff saved to https://phabricator.wikimedia.org/P24294 and previous config saved to /var/cache/conftool/dbconfig/20220408-113452-root.json
  • 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:11 mmandere: depool cp6009 for reimage - T290005
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:18 mmandere: pool cp6002 with HAProxy as TLS termination layer - T290005
  • 10:15 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1003.eqiad.wmnet with OS bullseye
  • 10:11 mmandere: pool cp6010 with HAProxy as TLS termination layer - T290005
  • 10:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS buster
  • 10:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS buster
  • 10:04 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1003.eqiad.wmnet with reason: host reimage
  • 10:00 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1003.eqiad.wmnet with reason: host reimage
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24293 and previous config saved to /var/cache/conftool/dbconfig/20220408-095458-ladsgroup.json
  • 09:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS buster
  • 09:48 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host dbprov1003.eqiad.wmnet with OS bullseye
  • 09:47 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2003.codfw.wmnet with OS bullseye
  • 09:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24292 and previous config saved to /var/cache/conftool/dbconfig/20220408-094325-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24291 and previous config saved to /var/cache/conftool/dbconfig/20220408-093953-ladsgroup.json
  • 09:35 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1002.eqiad.wmnet with OS bullseye
  • 09:35 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2003.codfw.wmnet with reason: host reimage
  • 09:32 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
  • 09:30 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2003.codfw.wmnet with reason: host reimage
  • 09:29 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24290 and previous config saved to /var/cache/conftool/dbconfig/20220408-092820-ladsgroup.json
  • 09:25 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS buster
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24289 and previous config saved to /var/cache/conftool/dbconfig/20220408-092448-ladsgroup.json
  • 09:24 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: host reimage
  • 09:19 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: host reimage
  • 09:16 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host dbprov2003.codfw.wmnet with OS bullseye
  • 09:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 09:13 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS buster
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24288 and previous config saved to /var/cache/conftool/dbconfig/20220408-091315-ladsgroup.json
  • 09:13 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24287 and previous config saved to /var/cache/conftool/dbconfig/20220408-090943-ladsgroup.json
  • 09:08 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host dbprov1002.eqiad.wmnet with OS bullseye
  • 09:08 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2002.codfw.wmnet with OS bullseye
  • 09:02 mmandere: depool cp6002 for reimage - T290005
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24286 and previous config saved to /var/cache/conftool/dbconfig/20220408-085810-ladsgroup.json
  • 08:57 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2102.codfw.wmnet with OS bullseye
  • 08:57 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS buster
  • 08:56 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2002.codfw.wmnet with reason: host reimage
  • 08:53 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2002.codfw.wmnet with reason: host reimage
  • 08:49 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1001.eqiad.wmnet with OS bullseye
  • 08:48 mmandere: depool cp6010 for reimage - T290005
  • 08:43 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2102.codfw.wmnet with reason: host reimage
  • 08:41 mmandere: pool cp6003 with HAProxy as TLS termination layer - T290005
  • 08:40 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2102.codfw.wmnet with reason: host reimage
  • 08:40 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host dbprov2002.codfw.wmnet with OS bullseye
  • 08:37 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS buster
  • 08:36 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1001.eqiad.wmnet with reason: host reimage
  • 08:33 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1001.eqiad.wmnet with reason: host reimage
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24285 and previous config saved to /var/cache/conftool/dbconfig/20220408-083353-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24284 and previous config saved to /var/cache/conftool/dbconfig/20220408-083345-ladsgroup.json
  • 08:33 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2151.codfw.wmnet with OS bullseye
  • 08:29 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2102.codfw.wmnet with OS bullseye
  • 08:26 mmandere: pool cp6011 with HAProxy as TLS termination layer - T290005
  • 08:24 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS buster
  • 08:21 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host dbprov1001.eqiad.wmnet with OS bullseye
  • 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24283 and previous config saved to /var/cache/conftool/dbconfig/20220408-081840-ladsgroup.json
  • 08:18 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2151.codfw.wmnet with reason: host reimage
  • 08:15 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2151.codfw.wmnet with reason: host reimage
  • 08:10 jynus: restart db1133 T299876
  • 08:06 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2001.codfw.wmnet with OS bullseye
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24282 and previous config saved to /var/cache/conftool/dbconfig/20220408-080335-ladsgroup.json
  • 08:01 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2151.codfw.wmnet with OS bullseye
  • 07:59 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bullseye
  • 07:54 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2001.codfw.wmnet with reason: host reimage
  • 07:50 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2001.codfw.wmnet with reason: host reimage
  • 07:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24281 and previous config saved to /var/cache/conftool/dbconfig/20220408-074829-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24280 and previous config saved to /var/cache/conftool/dbconfig/20220408-074723-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
  • 07:45 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
  • 07:42 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
  • 07:42 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 07:39 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 07:36 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host dbprov2001.codfw.wmnet with OS bullseye
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24279 and previous config saved to /var/cache/conftool/dbconfig/20220408-073442-root.json
  • 07:31 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bullseye
  • 07:28 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS buster
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24278 and previous config saved to /var/cache/conftool/dbconfig/20220408-072615-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 07:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 07:21 mmandere: depool cp6003 for reimage - T290005
  • 07:21 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24277 and previous config saved to /var/cache/conftool/dbconfig/20220408-071938-root.json
  • 07:12 mmandere: depool cp6011 for reimage - T290005
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24276 and previous config saved to /var/cache/conftool/dbconfig/20220408-070434-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P24275 and previous config saved to /var/cache/conftool/dbconfig/20220408-064930-root.json
  • 06:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P24274 and previous config saved to /var/cache/conftool/dbconfig/20220408-063426-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P24273 and previous config saved to /var/cache/conftool/dbconfig/20220408-061922-root.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P24272 and previous config saved to /var/cache/conftool/dbconfig/20220408-051044-root.json
  • 02:30 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: security updates - bking@cumin1001 - T304938

2022-04-07

  • 22:18 ejegg: restarted fundraising scheduled jobs
  • 22:08 ejegg: updated fundraising CiviCRM from 7b7b284d to a90a6709
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 21:46 ejegg: disabled fundraising scheduled jobs for CiviCRM upgrade
  • 21:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1101.eqiad.wmnet with OS bullseye
  • 21:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1100.eqiad.wmnet with OS bullseye
  • 21:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1102.eqiad.wmnet with OS bullseye
  • 21:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1099.eqiad.wmnet with OS bullseye
  • 21:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1101.eqiad.wmnet with reason: host reimage
  • 21:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1100.eqiad.wmnet with reason: host reimage
  • 21:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1098.eqiad.wmnet with OS bullseye
  • 21:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1102.eqiad.wmnet with reason: host reimage
  • 21:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1099.eqiad.wmnet with reason: host reimage
  • 21:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1097.eqiad.wmnet with OS bullseye
  • 21:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1102.eqiad.wmnet with reason: host reimage
  • 21:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1100.eqiad.wmnet with reason: host reimage
  • 21:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1101.eqiad.wmnet with reason: host reimage
  • 21:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1099.eqiad.wmnet with reason: host reimage
  • 21:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1096.eqiad.wmnet with OS bullseye
  • 21:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1098.eqiad.wmnet with reason: host reimage
  • 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1097.eqiad.wmnet with reason: host reimage
  • 20:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1098.eqiad.wmnet with reason: host reimage
  • 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1095.eqiad.wmnet with OS bullseye
  • 20:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1097.eqiad.wmnet with reason: host reimage
  • 20:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1102.eqiad.wmnet with OS bullseye
  • 20:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1096.eqiad.wmnet with reason: host reimage
  • 20:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1100.eqiad.wmnet with OS bullseye
  • 20:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1101.eqiad.wmnet with OS bullseye
  • 20:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1099.eqiad.wmnet with OS bullseye
  • 20:54 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS buster
  • 20:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1091.eqiad.wmnet with OS bullseye
  • 20:54 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS buster
  • 20:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1096.eqiad.wmnet with reason: host reimage
  • 20:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1095.eqiad.wmnet with reason: host reimage
  • 20:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1098.eqiad.wmnet with OS bullseye
  • 20:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS buster
  • 20:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1094.eqiad.wmnet with OS bullseye
  • 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1095.eqiad.wmnet with reason: host reimage
  • 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1097.eqiad.wmnet with OS bullseye
  • 20:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1092.eqiad.wmnet with OS bullseye
  • 20:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
  • 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1096.eqiad.wmnet with OS bullseye
  • 20:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1090.eqiad.wmnet with OS bullseye
  • 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS buster
  • 20:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1091.eqiad.wmnet with reason: host reimage
  • 20:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1094.eqiad.wmnet with reason: host reimage
  • 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1092.eqiad.wmnet with reason: host reimage
  • 20:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1095.eqiad.wmnet with OS bullseye
  • 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1089.eqiad.wmnet with OS bullseye
  • 20:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1094.eqiad.wmnet with reason: host reimage
  • 20:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1090.eqiad.wmnet with reason: host reimage
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1092.eqiad.wmnet with reason: host reimage
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1091.eqiad.wmnet with reason: host reimage
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1090.eqiad.wmnet with reason: host reimage
  • 20:26 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: security updates - bking@cumin1001 - T304938
  • 20:26 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1089.eqiad.wmnet with reason: host reimage
  • 20:24 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 20:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1089.eqiad.wmnet with reason: host reimage
  • 20:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1094.eqiad.wmnet with OS bullseye
  • 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1092.eqiad.wmnet with OS bullseye
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1091.eqiad.wmnet with OS bullseye
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1090.eqiad.wmnet with OS bullseye
  • 20:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1089.eqiad.wmnet with OS bullseye
  • 20:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1009.eqiad.wmnet
  • 20:03 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
  • 20:02 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
  • 19:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1009.eqiad.wmnet
  • 19:57 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1008.eqiad.wmnet
  • 19:57 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
  • 19:47 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 19:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 19:46 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 19:46 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 19:45 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1008.eqiad.wmnet
  • 19:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1102.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:29 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
  • 19:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1099.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1098.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1102.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1097.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1096.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1095.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:04 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1099.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1098.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1097.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1096.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1095.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T304938
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1094.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1092.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1091.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1089.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1090.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:59 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1027.eqiad.wmnet
  • 18:53 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1028.eqiad.wmnet
  • 18:48 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1027.eqiad.wmnet
  • 18:48 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1029.eqiad.wmnet
  • 18:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1094.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1092.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1091.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1090.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:43 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1028.eqiad.wmnet
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1089.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:43 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1030.eqiad.wmnet
  • 18:39 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1029.eqiad.wmnet
  • 18:38 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1031.eqiad.wmnet
  • 18:35 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1030.eqiad.wmnet
  • 18:34 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1032.eqiad.wmnet
  • 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:32 ryankemper: [Elastic] Pooled `elastic1052` (likely was erroneously left depooled after https://phabricator.wikimedia.org/P19885)
  • 18:29 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1031.eqiad.wmnet
  • 18:29 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1033.eqiad.wmnet
  • 18:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:25 razzi@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1032.eqiad.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1034.eqiad.wmnet
  • 18:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1033.eqiad.wmnet
  • 18:17 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1035.eqiad.wmnet
  • 18:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet with OS bullseye
  • 18:09 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 18:08 ryankemper: [WCQS Deploy] Successful test query placed on commons-query.wikimedia.org, there's no relevant criticals in Icinga, and Grafana looks good. WCQS deploy complete
  • 18:08 ryankemper: [WCQS Deploy] Restarted `wcqs-updater` across all hosts
  • 18:08 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1034.eqiad.wmnet
  • 18:07 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1035.eqiad.wmnet
  • 18:07 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1037.eqiad.wmnet
  • 18:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 18:02 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1036.eqiad.wmnet
  • 18:01 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@0d95eca] (wcqs): Deploy 0.3.110 to WCQS (duration: 01m 58s)
  • 18:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet with OS bullseye
  • 18:00 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.110` to `wcqs1003.eqiad.wmnet`, proceeding to rest of fleet
  • 17:59 ryankemper@deploy1002: Started deploy [wdqs/wdqs@0d95eca] (wcqs): Deploy 0.3.110 to WCQS
  • 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T305300)', diff saved to https://phabricator.wikimedia.org/P24270 and previous config saved to /var/cache/conftool/dbconfig/20220407-175730-ladsgroup.json
  • 17:52 mutante: rebooting wtp103* servers
  • 17:52 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1037.eqiad.wmnet
  • 17:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 17:50 ryankemper: T293862 Removed touched files so that it'll be easier to see when the new jvmquake threshold is crossed: `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-public' "rm -fv '/tmp/wdqs_blazegraph_jvmquake_warn_gc'"`
  • 17:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 17:46 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1036.eqiad.wmnet
  • 17:44 ryankemper: T293862 Rolling restart of wdqs public is complete; new jvmquake settings have been uptaken on wdqs public hosts: `-agentpath:/usr/lib/libjvmquake.so=1000,5,0,warn=60,touch=/tmp/wdqs_blazegraph_jvmquake_warn_gc`
  • 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P24269 and previous config saved to /var/cache/conftool/dbconfig/20220407-174224-ladsgroup.json
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 17:40 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 17:40 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 17:40 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 17:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 17:38 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@0d95eca]: 0.3.110 (duration: 06m 21s)
  • 17:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 17:32 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.110` on canary `wdqs1003`; proceeding to rest of fleet
  • 17:31 ryankemper@deploy1002: Started deploy [wdqs/wdqs@0d95eca]: 0.3.110
  • 17:31 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.110`. Pre-deploy tests passing on canary `wdqs1003`
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 17:31 ryankemper: [WDQS] T293862 Need to do a rolling restart of wdqs public; going to just roll a full deploy since it's equal work
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P24268 and previous config saved to /var/cache/conftool/dbconfig/20220407-172719-ladsgroup.json
  • 17:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1004.eqiad.wmnet with OS bullseye
  • 17:17 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1003.eqiad.wmnet with OS bullseye
  • 17:16 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 17:14 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 17:14 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T305300)', diff saved to https://phabricator.wikimedia.org/P24267 and previous config saved to /var/cache/conftool/dbconfig/20220407-171211-ladsgroup.json
  • 17:12 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 17:11 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T305300)', diff saved to https://phabricator.wikimedia.org/P24266 and previous config saved to /var/cache/conftool/dbconfig/20220407-171105-ladsgroup.json
  • 17:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24265 and previous config saved to /var/cache/conftool/dbconfig/20220407-171052-ladsgroup.json
  • 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 17:09 herron@cumin1001: END (FAIL) - Cookbook sre.kafka.reboot-workers (exit_code=99) for Kafka logging-codfw cluster: Reboot kafka nodes
  • 17:08 herron@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka logging-codfw cluster: Reboot kafka nodes
  • 17:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 17:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 16:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P24264 and previous config saved to /var/cache/conftool/dbconfig/20220407-165547-ladsgroup.json
  • 16:50 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T304938
  • 16:49 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T304938
  • 16:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 16:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
  • 16:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P24263 and previous config saved to /var/cache/conftool/dbconfig/20220407-164042-ladsgroup.json
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24262 and previous config saved to /var/cache/conftool/dbconfig/20220407-162537-ladsgroup.json
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24261 and previous config saved to /var/cache/conftool/dbconfig/20220407-162430-ladsgroup.json
  • 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24260 and previous config saved to /var/cache/conftool/dbconfig/20220407-162421-ladsgroup.json
  • 16:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 16:17 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P24259 and previous config saved to /var/cache/conftool/dbconfig/20220407-160916-ladsgroup.json
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P24258 and previous config saved to /var/cache/conftool/dbconfig/20220407-155410-ladsgroup.json
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24257 and previous config saved to /var/cache/conftool/dbconfig/20220407-153905-ladsgroup.json
  • 15:21 mmandere: pool cp6004 with HAProxy as TLS termination layer - T290005
  • 15:14 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1171.eqiad.wmnet with OS bullseye
  • 15:12 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS buster
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24256 and previous config saved to /var/cache/conftool/dbconfig/20220407-150640-ladsgroup.json
  • 15:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 15:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24255 and previous config saved to /var/cache/conftool/dbconfig/20220407-150632-ladsgroup.json
  • 14:59 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: host reimage
  • 14:56 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: host reimage
  • 14:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2144.codfw.wmnet with reason: Rebooting for T303174
  • 14:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2144.codfw.wmnet with reason: Rebooting for T303174
  • 14:54 kormat@cumin1001: dbctl commit (dc=all): 'db2143 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P24254 and previous config saved to /var/cache/conftool/dbconfig/20220407-145455-kormat.json
  • 14:51 kormat@cumin1001: dbctl commit (dc=all): 'db2143 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P24253 and previous config saved to /var/cache/conftool/dbconfig/20220407-145139-kormat.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P24252 and previous config saved to /var/cache/conftool/dbconfig/20220407-145127-ladsgroup.json
  • 14:44 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1171.eqiad.wmnet with OS bullseye
  • 14:44 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
  • 14:41 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
  • 14:36 kormat@cumin1001: dbctl commit (dc=all): 'db2143 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P24251 and previous config saved to /var/cache/conftool/dbconfig/20220407-143635-kormat.json
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P24250 and previous config saved to /var/cache/conftool/dbconfig/20220407-143622-ladsgroup.json
  • 14:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2143.codfw.wmnet with reason: Rebooting for T303174
  • 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2143.codfw.wmnet with reason: Rebooting for T303174
  • 14:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2131.codfw.wmnet with reason: Rebooting for T303174
  • 14:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2131.codfw.wmnet with reason: Rebooting for T303174
  • 14:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS buster
  • 14:22 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2141.codfw.wmnet with OS bullseye
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24249 and previous config saved to /var/cache/conftool/dbconfig/20220407-142117-ladsgroup.json
  • 14:19 mmandere: depool cp6004 for reimage - T290005
  • 14:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2115.codfw.wmnet with reason: Rebooting for T303174
  • 14:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2115.codfw.wmnet with reason: Rebooting for T303174
  • 14:13 mmandere: pool cp6012 with HAProxy as TLS termination layer - T290005
  • 14:10 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS buster
  • 14:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2025.codfw.wmnet with reason: Rebooting for T303174
  • 14:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2025.codfw.wmnet with reason: Rebooting for T303174
  • 14:08 mmandere: pool cp6005 with HAProxy as TLS termination layer - T290005
  • 14:06 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS buster
  • 14:06 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2141.codfw.wmnet with reason: host reimage
  • 14:04 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T304938
  • 14:03 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2141.codfw.wmnet with reason: host reimage
  • 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2034.codfw.wmnet with reason: Rebooting for T303174
  • 14:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2034.codfw.wmnet with reason: Rebooting for T303174
  • 13:55 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1150.eqiad.wmnet with OS bullseye
  • 13:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2029.codfw.wmnet with reason: Rebooting for T303174
  • 13:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2029.codfw.wmnet with reason: Rebooting for T303174
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24248 and previous config saved to /var/cache/conftool/dbconfig/20220407-135052-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24247 and previous config saved to /var/cache/conftool/dbconfig/20220407-135044-ladsgroup.json
  • 13:49 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2141.codfw.wmnet with OS bullseye
  • 13:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2027.codfw.wmnet with reason: Rebooting for T303174
  • 13:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2027.codfw.wmnet with reason: Rebooting for T303174
  • 13:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 13:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2033.codfw.wmnet with reason: Rebooting for T303174
  • 13:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2033.codfw.wmnet with reason: Rebooting for T303174
  • 13:39 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: host reimage
  • 13:37 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 13:36 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: host reimage
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24246 and previous config saved to /var/cache/conftool/dbconfig/20220407-133539-ladsgroup.json
  • 13:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2031.codfw.wmnet with reason: Rebooting for T303174
  • 13:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2031.codfw.wmnet with reason: Rebooting for T303174
  • 13:33 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
  • 13:30 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
  • 13:29 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2139.codfw.wmnet with OS bullseye
  • 13:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2026.codfw.wmnet with reason: Rebooting for T303174
  • 13:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2026.codfw.wmnet with reason: Rebooting for T303174
  • 13:24 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1150.eqiad.wmnet with OS bullseye
  • 13:20 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS buster
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24245 and previous config saved to /var/cache/conftool/dbconfig/20220407-132034-ladsgroup.json
  • 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2032.codfw.wmnet with reason: Rebooting for T303174
  • 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2032.codfw.wmnet with reason: Rebooting for T303174
  • 13:14 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1145.eqiad.wmnet with OS bullseye
  • 13:13 mmandere: depool cp6012 for reimage - T290005
  • 13:13 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2139.codfw.wmnet with reason: host reimage
  • 13:12 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS buster
  • 13:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2030.codfw.wmnet with reason: Rebooting for T303174
  • 13:10 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2139.codfw.wmnet with reason: host reimage
  • 13:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2030.codfw.wmnet with reason: Rebooting for T303174
  • 13:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubemaster2001.codfw.wmnet with reason: reimage
  • 13:08 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubemaster2001.codfw.wmnet with reason: reimage
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24244 and previous config saved to /var/cache/conftool/dbconfig/20220407-130529-ladsgroup.json
  • 13:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2028.codfw.wmnet with reason: Rebooting for T303174
  • 13:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2028.codfw.wmnet with reason: Rebooting for T303174
  • 12:58 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: host reimage
  • 12:58 mmandere: depool cp6005 for reimage - T290005
  • 12:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2104.codfw.wmnet with reason: Rebooting for T303174
  • 12:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2104.codfw.wmnet with reason: Rebooting for T303174
  • 12:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 12:57 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 12:55 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: host reimage
  • 12:55 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2139.codfw.wmnet with OS bullseye
  • 12:55 mmandere: pool cp6013 with HAProxy as TLS termination layer - T290005
  • 12:52 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS buster
  • 12:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2135.codfw.wmnet with reason: Rebooting for T303174
  • 12:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2135.codfw.wmnet with reason: Rebooting for T303174
  • 12:49 akosiaris: sudo gnt-cluster modify -H kvm:migration_downtime=3000 for ganeti01.svc.codfw.wmnet and ganeti01.svc.eqiad.wmnet to combat some logstash VM migration issues.
  • 12:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2134.codfw.wmnet with reason: Rebooting for T303174
  • 12:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2134.codfw.wmnet with reason: Rebooting for T303174
  • 12:44 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1145.eqiad.wmnet with OS bullseye
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2133.codfw.wmnet with reason: Rebooting for T303174
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2133.codfw.wmnet with reason: Rebooting for T303174
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2078,2133].codfw.wmnet with reason: Rebooting primary T303174
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2078,2133].codfw.wmnet with reason: Rebooting primary T303174
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2132.codfw.wmnet with reason: Rebooting for T303174
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2132.codfw.wmnet with reason: Rebooting for T303174
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2078,2132].codfw.wmnet with reason: Rebooting primary T303174
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2078,2132].codfw.wmnet with reason: Rebooting primary T303174
  • 12:32 mmandere: pool cp3051 with HAProxy as TLS termination layer - T290005
  • 12:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3051.esams.wmnet with OS buster
  • 12:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2096.codfw.wmnet with reason: Rebooting for T303174
  • 12:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2096.codfw.wmnet with reason: Rebooting for T303174
  • 12:19 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:30:00 on db2096.codfw.wmnet with reason: Rebooting for T303174
  • 12:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2096.codfw.wmnet with reason: Rebooting for T303174
  • 12:08 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 12:06 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24243 and previous config saved to /var/cache/conftool/dbconfig/20220407-120514-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24242 and previous config saved to /var/cache/conftool/dbconfig/20220407-120507-ladsgroup.json
  • 12:03 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 12:03 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24241 and previous config saved to /var/cache/conftool/dbconfig/20220407-115002-ladsgroup.json
  • 11:49 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1140.eqiad.wmnet with OS bullseye
  • 11:46 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2101.codfw.wmnet with OS bullseye
  • 11:45 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS buster
  • 11:35 mmandere: depool cp6013 for reimage - T290005
  • 11:35 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1140.eqiad.wmnet with reason: host reimage
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24240 and previous config saved to /var/cache/conftool/dbconfig/20220407-113455-ladsgroup.json
  • 11:34 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3051.esams.wmnet with OS buster
  • 11:32 jforrester@deploy1002: Finished deploy [integration/docroot@d88e2fa]: d88e2fa19fd6 [WikiLambda] Fix link typo and re-group/re-word other links (duration: 00m 09s)
  • 11:32 jforrester@deploy1002: Started deploy [integration/docroot@d88e2fa]: d88e2fa19fd6 [WikiLambda] Fix link typo and re-group/re-word other links
  • 11:31 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2101.codfw.wmnet with reason: host reimage
  • 11:31 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1140.eqiad.wmnet with reason: host reimage
  • 11:28 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2101.codfw.wmnet with reason: host reimage
  • 11:23 mmandere: depool cp3051 for reimage - T290005
  • 11:23 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1140.eqiad.wmnet with OS bullseye
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24239 and previous config saved to /var/cache/conftool/dbconfig/20220407-111950-ladsgroup.json
  • 11:17 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2101.codfw.wmnet with OS bullseye
  • 11:17 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1139.eqiad.wmnet with OS bullseye
  • 11:16 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:15 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:12 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2100.codfw.wmnet with OS bullseye
  • 11:03 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1139.eqiad.wmnet with reason: host reimage
  • 10:59 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1139.eqiad.wmnet with reason: host reimage
  • 10:59 mmandere: pool cp3053 with HAProxy as TLS termination layer - T290005
  • 10:58 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2100.codfw.wmnet with reason: host reimage
  • 10:55 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2100.codfw.wmnet with reason: host reimage
  • 10:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3053.esams.wmnet with OS buster
  • 10:51 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS bullseye
  • 10:45 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:44 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2100.codfw.wmnet with OS bullseye
  • 10:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:41 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1116.eqiad.wmnet with OS bullseye
  • 10:40 mmandere: pool cp6006 with HAProxy as TLS termination layer - T290005
  • 10:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubemaster2002.codfw.wmnet with reason: reimage
  • 10:37 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubemaster2002.codfw.wmnet with reason: reimage
  • 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24238 and previous config saved to /var/cache/conftool/dbconfig/20220407-103739-ladsgroup.json
  • 10:37 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS buster
  • 10:36 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:36 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:35 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 10:35 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24237 and previous config saved to /var/cache/conftool/dbconfig/20220407-102821-root.json
  • 10:28 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: host reimage
  • 10:27 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2099.codfw.wmnet with OS bullseye
  • 10:25 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:24 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: host reimage
  • 10:24 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24236 and previous config saved to /var/cache/conftool/dbconfig/20220407-102234-ladsgroup.json
  • 10:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24235 and previous config saved to /var/cache/conftool/dbconfig/20220407-101936-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T305300)', diff saved to https://phabricator.wikimedia.org/P24234 and previous config saved to /var/cache/conftool/dbconfig/20220407-101928-ladsgroup.json
  • 10:16 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1116.eqiad.wmnet with OS bullseye
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24233 and previous config saved to /var/cache/conftool/dbconfig/20220407-101318-root.json
  • 10:12 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2099.codfw.wmnet with reason: host reimage
  • 10:09 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2099.codfw.wmnet with reason: host reimage
  • 10:08 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 10:08 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1102.eqiad.wmnet with OS bullseye
  • 10:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1007.eqiad.wmnet
  • 10:08 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 10:07 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24232 and previous config saved to /var/cache/conftool/dbconfig/20220407-100729-ladsgroup.json
  • 10:06 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 10:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P24231 and previous config saved to /var/cache/conftool/dbconfig/20220407-100423-ladsgroup.json
  • 10:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
  • 10:00 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
  • 10:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
  • 09:58 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2099.codfw.wmnet with OS bullseye
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24230 and previous config saved to /var/cache/conftool/dbconfig/20220407-095814-root.json
  • 09:56 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24229 and previous config saved to /var/cache/conftool/dbconfig/20220407-095624-root.json
  • 09:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1007.eqiad.wmnet
  • 09:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1102.eqiad.wmnet with reason: host reimage
  • 09:54 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24228 and previous config saved to /var/cache/conftool/dbconfig/20220407-095224-ladsgroup.json
  • 09:51 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1102.eqiad.wmnet with reason: host reimage
  • 09:50 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:50 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P24227 and previous config saved to /var/cache/conftool/dbconfig/20220407-094917-ladsgroup.json
  • 09:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:43 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1102.eqiad.wmnet with OS bullseye
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P24226 and previous config saved to /var/cache/conftool/dbconfig/20220407-094310-root.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24225 and previous config saved to /var/cache/conftool/dbconfig/20220407-094120-root.json
  • 09:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2129.codfw.wmnet with reason: Rebooting for T303174
  • 09:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2129.codfw.wmnet with reason: Rebooting for T303174
  • 09:39 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS buster
  • 09:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 09:37 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 09:35 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2098.codfw.wmnet with OS bullseye
  • 09:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 09:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 09:34 mmandere: depool cp6006 for reimage - T290005
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T305300)', diff saved to https://phabricator.wikimedia.org/P24224 and previous config saved to /var/cache/conftool/dbconfig/20220407-093412-ladsgroup.json
  • 09:33 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS buster
  • 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2123.codfw.wmnet with reason: Rebooting for T303174
  • 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2123.codfw.wmnet with reason: Rebooting for T303174
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24223 and previous config saved to /var/cache/conftool/dbconfig/20220407-092616-root.json
  • 09:25 mmandere: depool cp3053 for reimage - T290005
  • 09:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2105.codfw.wmnet with reason: Rebooting for T303174
  • 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2105.codfw.wmnet with reason: Rebooting for T303174
  • 09:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Rebooting primary T303174
  • 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Rebooting primary T303174
  • 09:20 mmandere: pool cp6014 with HAProxy as TLS termination layer - T290005
  • 09:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS buster
  • 09:14 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
  • 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2152.codfw.wmnet with reason: Rebooting for T303174
  • 09:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2152.codfw.wmnet with reason: Rebooting for T303174
  • 09:11 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P24222 and previous config saved to /var/cache/conftool/dbconfig/20220407-091112-root.json
  • 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2150.codfw.wmnet with reason: Rebooting for T303174
  • 09:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2150.codfw.wmnet with reason: Rebooting for T303174
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T305300)', diff saved to https://phabricator.wikimedia.org/P24221 and previous config saved to /var/cache/conftool/dbconfig/20220407-090201-ladsgroup.json
  • 09:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:01 mmandere: pool cp3050 with HAProxy as TLS termination layer - T290005
  • 09:00 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2098.codfw.wmnet with OS bullseye
  • 08:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2122.codfw.wmnet with reason: Rebooting for T303174
  • 08:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2122.codfw.wmnet with reason: Rebooting for T303174
  • 08:56 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3050.esams.wmnet with OS buster
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P24220 and previous config saved to /var/cache/conftool/dbconfig/20220407-085608-root.json
  • 08:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24219 and previous config saved to /var/cache/conftool/dbconfig/20220407-084140-marostegui.json
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P24218 and previous config saved to /var/cache/conftool/dbconfig/20220407-084103-root.json
  • 08:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24217 and previous config saved to /var/cache/conftool/dbconfig/20220407-083209-marostegui.json
  • 08:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 08:27 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
  • 08:26 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 08:23 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
  • 08:23 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24216 and previous config saved to /var/cache/conftool/dbconfig/20220407-081910-ladsgroup.json
  • 08:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P24215 and previous config saved to /var/cache/conftool/dbconfig/20220407-081704-marostegui.json
  • 08:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:13 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.6 refs T305212
  • 08:09 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS buster
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P24214 and previous config saved to /var/cache/conftool/dbconfig/20220407-080159-marostegui.json
  • 08:00 mmandere: depool cp6014 for reimage - T290005
  • 07:55 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3050.esams.wmnet with OS buster
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24213 and previous config saved to /var/cache/conftool/dbconfig/20220407-074654-marostegui.json
  • 07:44 mmandere: depool cp3050 for reimage - T290005
  • 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1163', diff saved to https://phabricator.wikimedia.org/P24212 and previous config saved to /var/cache/conftool/dbconfig/20220407-073013-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300775)', diff saved to https://phabricator.wikimedia.org/P24211 and previous config saved to /var/cache/conftool/dbconfig/20220407-072813-marostegui.json
  • 07:17 hashar: CI and Gerrit are back up
  • 07:14 hashar: gerrit1001.wikimedia.org: restarted apache2 service
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24210 and previous config saved to /var/cache/conftool/dbconfig/20220407-071308-marostegui.json
  • 07:10 hashar: Restarting contint2001.wikimedia.Org
  • 07:10 hashar: Restarting gerrit1001.wikimedia.org
  • 07:02 hashar: Restarting contint1001.wikimedia.org
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24209 and previous config saved to /var/cache/conftool/dbconfig/20220407-065803-marostegui.json
  • 06:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 06:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 06:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300775)', diff saved to https://phabricator.wikimedia.org/P24208 and previous config saved to /var/cache/conftool/dbconfig/20220407-064258-marostegui.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24207 and previous config saved to /var/cache/conftool/dbconfig/20220407-062736-marostegui.json
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24206 and previous config saved to /var/cache/conftool/dbconfig/20220407-062728-marostegui.json
  • 06:27 ryankemper: [Elastic] Manually restarted elasticsearch exporters on `elastic2043` and `elastic2058`
  • 06:25 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - ryankemper@cumin1001 - T304938
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P24205 and previous config saved to /var/cache/conftool/dbconfig/20220407-061223-marostegui.json
  • 06:00 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - ryankemper@cumin1001 - T304938
  • 05:58 ryankemper: [Elastic] Manually restarted elasticsearch exporters on `cloudelastic1004` and `elastic2054`
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P24203 and previous config saved to /var/cache/conftool/dbconfig/20220407-055718-marostegui.json
  • 05:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - ryankemper@cumin1001 - T304938
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24202 and previous config saved to /var/cache/conftool/dbconfig/20220407-054213-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2076 db2086:3317 db2086:3318 db2107 db2137:3314 db2137:3315 db2143 db2147 es2029 es2030 T305469', diff saved to https://phabricator.wikimedia.org/P24201 and previous config saved to /var/cache/conftool/dbconfig/20220407-050149-root.json
  • 04:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24200 and previous config saved to /var/cache/conftool/dbconfig/20220407-044158-marostegui.json
  • 04:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 04:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 04:29 ryankemper: [Elastic] for future reference, we still need to fix the fact that we haven't told systemd that the prometheus-wmf-elasticsearch exporters need to start after the actual elasticsearch service
  • 04:13 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - ryankemper@cumin1001 - T304938
  • 04:13 ryankemper: [Elastic] Beginning rolling reboot of codfw elastic to apply kernel security updates: `ryankemper@cumin1001:~$ sudo -E cookbook sre.elasticsearch.rolling-operation search_codfw "codfw cluster reboot" --reboot --nodes-per-run 3 --start-datetime 2022-04-07T04:09:05 --task-id T304938`
  • 02:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24199 and previous config saved to /var/cache/conftool/dbconfig/20220407-024347-marostegui.json
  • 02:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P24198 and previous config saved to /var/cache/conftool/dbconfig/20220407-022842-marostegui.json
  • 02:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P24197 and previous config saved to /var/cache/conftool/dbconfig/20220407-021337-marostegui.json
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24196 and previous config saved to /var/cache/conftool/dbconfig/20220407-015832-marostegui.json
  • 00:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24195 and previous config saved to /var/cache/conftool/dbconfig/20220407-005817-marostegui.json
  • 00:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T297189)', diff saved to https://phabricator.wikimedia.org/P24194 and previous config saved to /var/cache/conftool/dbconfig/20220407-005809-marostegui.json
  • 00:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P24193 and previous config saved to /var/cache/conftool/dbconfig/20220407-004304-marostegui.json
  • 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P24192 and previous config saved to /var/cache/conftool/dbconfig/20220407-002759-marostegui.json
  • 00:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T297189)', diff saved to https://phabricator.wikimedia.org/P24191 and previous config saved to /var/cache/conftool/dbconfig/20220407-001254-marostegui.json

2022-04-06

  • 23:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:47 krinkle@deploy1002: Synchronized w/static.php: Ic87a8a3d00db (duration: 00m 53s)
  • 23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T297189)', diff saved to https://phabricator.wikimedia.org/P24190 and previous config saved to /var/cache/conftool/dbconfig/20220406-232126-marostegui.json
  • 23:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 23:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24189 and previous config saved to /var/cache/conftool/dbconfig/20220406-232118-marostegui.json
  • 23:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:10 krinkle@deploy1002: Synchronized w/static: I5a05f4728 (duration: 00m 54s)
  • 23:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P24188 and previous config saved to /var/cache/conftool/dbconfig/20220406-230613-marostegui.json
  • 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T300775)', diff saved to https://phabricator.wikimedia.org/P24187 and previous config saved to /var/cache/conftool/dbconfig/20220406-230118-marostegui.json
  • 23:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 23:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 23:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300775)', diff saved to https://phabricator.wikimedia.org/P24186 and previous config saved to /var/cache/conftool/dbconfig/20220406-230110-marostegui.json
  • 22:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P24185 and previous config saved to /var/cache/conftool/dbconfig/20220406-225108-marostegui.json
  • 22:49 mutante: parse2004, parse2003 - rebooting
  • 22:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24184 and previous config saved to /var/cache/conftool/dbconfig/20220406-224605-marostegui.json
  • 22:42 mutante: parse2006, parse2005 - rebooting
  • 22:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24183 and previous config saved to /var/cache/conftool/dbconfig/20220406-223603-marostegui.json
  • 22:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 22:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 22:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24182 and previous config saved to /var/cache/conftool/dbconfig/20220406-223100-marostegui.json
  • 22:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 22:26 mutante: parse2007, parse2008 - rebooting
  • 22:16 mutante: parse2009, parse2010 - rebooting
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300775)', diff saved to https://phabricator.wikimedia.org/P24181 and previous config saved to /var/cache/conftool/dbconfig/20220406-221555-marostegui.json
  • 22:14 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 22:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
  • 22:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
  • 22:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
  • 22:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 22:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
  • 22:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
  • 22:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
  • 21:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:57 mutante: parse2011, parse2012 - rebooting
  • 21:51 mutante: parse2013, parse2014 - rebooting
  • 21:46 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1039.eqiad.wmnet
  • 21:42 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@a1c5c6f]: (no justification provided) (duration: 04m 34s)
  • 21:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 21:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 21:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 21:38 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@a1c5c6f]: (no justification provided)
  • 21:37 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: security updates - bking@cumin1001 - T304938
  • 21:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24180 and previous config saved to /var/cache/conftool/dbconfig/20220406-213605-marostegui.json
  • 21:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24179 and previous config saved to /var/cache/conftool/dbconfig/20220406-213557-marostegui.json
  • 21:35 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1039.eqiad.wmnet
  • 21:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1038.eqiad.wmnet
  • 21:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:26 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1038.eqiad.wmnet
  • 21:26 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1037.eqiad.wmnet
  • 21:21 mutante: wtp1037,wtp1038,wtp1039 - rebooting sequentially
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P24178 and previous config saved to /var/cache/conftool/dbconfig/20220406-212052-marostegui.json
  • 21:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1037.eqiad.wmnet
  • 21:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1037.wmnet
  • 21:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=wtp1037.wmnet
  • 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P24177 and previous config saved to /var/cache/conftool/dbconfig/20220406-210545-marostegui.json
  • 21:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:03 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 20:56 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 20:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: security updates - bking@cumin1001 - T304938
  • 20:51 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 20:50 cjming: end of UTC late backport & config window
  • 20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24176 and previous config saved to /var/cache/conftool/dbconfig/20220406-205040-marostegui.json
  • 20:46 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 20:38 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:38 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:38 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:35 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 cjming@deploy1002: Synchronized php-1.39.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents/desktopWebUIActions.js: Backport: Update to 78eef14, rename viewportSize to viewportSizeBucket (T301391) (duration: 00m 55s)
  • 20:03 mutante: phabricator about to be rebooted - hang on
  • 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24174 and previous config saved to /var/cache/conftool/dbconfig/20220406-195925-marostegui.json
  • 19:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T297189)', diff saved to https://phabricator.wikimedia.org/P24173 and previous config saved to /var/cache/conftool/dbconfig/20220406-195917-marostegui.json
  • 19:59 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: reboot for maintenance
  • 19:58 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: reboot for maintenance
  • 19:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 19:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 19:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 19:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P24172 and previous config saved to /var/cache/conftool/dbconfig/20220406-194412-marostegui.json
  • 19:31 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 19:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 19:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P24171 and previous config saved to /var/cache/conftool/dbconfig/20220406-192907-marostegui.json
  • 19:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:23 rook@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudvirt1016.eqiad.wmnet
  • 19:23 rook@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1016.eqiad.wmnet
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T297189)', diff saved to https://phabricator.wikimedia.org/P24170 and previous config saved to /var/cache/conftool/dbconfig/20220406-191402-marostegui.json
  • 19:13 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 19:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host dse-k8s-worker1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host dse-k8s-worker1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host dse-k8s-worker1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host dse-k8s-worker1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T297189)', diff saved to https://phabricator.wikimedia.org/P24169 and previous config saved to /var/cache/conftool/dbconfig/20220406-183927-marostegui.json
  • 18:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 18:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T297189)', diff saved to https://phabricator.wikimedia.org/P24168 and previous config saved to /var/cache/conftool/dbconfig/20220406-183919-marostegui.json
  • 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P24167 and previous config saved to /var/cache/conftool/dbconfig/20220406-182414-marostegui.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P24166 and previous config saved to /var/cache/conftool/dbconfig/20220406-180909-marostegui.json
  • 18:01 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 17:58 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T297189)', diff saved to https://phabricator.wikimedia.org/P24165 and previous config saved to /var/cache/conftool/dbconfig/20220406-175403-marostegui.json
  • 17:42 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 17:25 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2097.codfw.wmnet with OS bullseye
  • 17:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2120.codfw.wmnet with reason: Rebooting for T303174
  • 17:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2120.codfw.wmnet with reason: Rebooting for T303174
  • 17:11 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2097.codfw.wmnet with reason: host reimage
  • 17:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2118.codfw.wmnet with reason: Rebooting for T303174
  • 17:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2118.codfw.wmnet with reason: Rebooting for T303174
  • 17:08 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2097.codfw.wmnet with reason: host reimage
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T297189)', diff saved to https://phabricator.wikimedia.org/P24164 and previous config saved to /var/cache/conftool/dbconfig/20220406-170223-marostegui.json
  • 17:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:01 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2108.codfw.wmnet with reason: Rebooting for T303174
  • 17:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2108.codfw.wmnet with reason: Rebooting for T303174
  • 16:57 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2097.codfw.wmnet with OS bullseye
  • 16:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2124.codfw.wmnet with reason: Rebooting for T303174
  • 16:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2124.codfw.wmnet with reason: Rebooting for T303174
  • 16:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2117.codfw.wmnet with reason: Rebooting for T303174
  • 16:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2117.codfw.wmnet with reason: Rebooting for T303174
  • 16:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2114.codfw.wmnet with reason: Rebooting for T303174
  • 16:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2114.codfw.wmnet with reason: Rebooting for T303174
  • 16:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2128.codfw.wmnet with reason: Rebooting for T303174
  • 16:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2128.codfw.wmnet with reason: Rebooting for T303174
  • 16:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2113.codfw.wmnet with reason: Rebooting for T303174
  • 16:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2113.codfw.wmnet with reason: Rebooting for T303174
  • 16:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2111.codfw.wmnet with reason: Rebooting for T303174
  • 16:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2111.codfw.wmnet with reason: Rebooting for T303174
  • 16:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2147.codfw.wmnet with reason: Rebooting for T303174
  • 16:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2147.codfw.wmnet with reason: Rebooting for T303174
  • 16:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:02 mforns@deploy1002: Finished deploy [airflow-dags/analytics@b029f10]: (no justification provided) (duration: 00m 08s)
  • 16:02 mforns@deploy1002: Started deploy [airflow-dags/analytics@b029f10]: (no justification provided)
  • 15:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2140.codfw.wmnet with reason: Rebooting for T303174
  • 15:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2140.codfw.wmnet with reason: Rebooting for T303174
  • 15:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc2040.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2040.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:51 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3018fdb]: (no justification provided) (duration: 00m 07s)
  • 15:51 mforns@deploy1002: Started deploy [airflow-dags/analytics@3018fdb]: (no justification provided)
  • 15:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2137.codfw.wmnet with reason: Rebooting for T303174
  • 15:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2137.codfw.wmnet with reason: Rebooting for T303174
  • 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2136.codfw.wmnet with reason: Rebooting for T303174
  • 15:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2136.codfw.wmnet with reason: Rebooting for T303174
  • 15:33 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2119.codfw.wmnet with reason: Rebooting for T303174
  • 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2119.codfw.wmnet with reason: Rebooting for T303174
  • 15:31 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:29 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:28 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:15 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:11 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:07 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:07 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:07 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:06 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:06 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@3018fdb]: (no justification provided) (duration: 00m 07s)
  • 15:06 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:06 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@3018fdb]: (no justification provided)
  • 15:06 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:04 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:04 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:02 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1006.eqiad.wmnet
  • 15:02 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:01 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@dc748fb]: (no justification provided) (duration: 00m 08s)
  • 15:01 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@dc748fb]: (no justification provided)
  • 14:58 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:58 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:57 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:57 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:57 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:57 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:55 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:55 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:55 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2106.codfw.wmnet with reason: Rebooting for T303174
  • 14:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2106.codfw.wmnet with reason: Rebooting for T303174
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:52 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:52 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1006.eqiad.wmnet
  • 14:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2149.codfw.wmnet with reason: Rebooting for T303174
  • 14:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2149.codfw.wmnet with reason: Rebooting for T303174
  • 14:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2127.codfw.wmnet with reason: Rebooting for T303174
  • 14:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2127.codfw.wmnet with reason: Rebooting for T303174
  • 14:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 14:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 14:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24163 and previous config saved to /var/cache/conftool/dbconfig/20220406-143647-marostegui.json
  • 14:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2109.codfw.wmnet with reason: Rebooting for T303174
  • 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2109.codfw.wmnet with reason: Rebooting for T303174
  • 14:27 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1005.eqiad.wmnet
  • 14:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2148.codfw.wmnet with reason: Rebooting for T303174
  • 14:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2148.codfw.wmnet with reason: Rebooting for T303174
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P24162 and previous config saved to /var/cache/conftool/dbconfig/20220406-142142-marostegui.json
  • 14:21 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1005.eqiad.wmnet
  • 14:15 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:08 mmandere: pool cp4021 with HAProxy as TLS termination layer - T290005
  • 14:06 moritzm: installing webperf2004 T305460
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P24160 and previous config saved to /var/cache/conftool/dbconfig/20220406-140637-marostegui.json
  • 14:05 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:02 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1004.eqiad.wmnet
  • 14:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
  • 13:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2138.codfw.wmnet with reason: Rebooting for T303174
  • 13:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2138.codfw.wmnet with reason: Rebooting for T303174
  • 13:55 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:55 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:54 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:53 moritzm: installing webperf2003 T305460
  • 13:52 kart_: UTC afternoon backport window - Done.
  • 13:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1004.eqiad.wmnet
  • 13:51 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Rearrange zh namespace names and namespace aliases (T286291 T298308) (duration: 00m 53s)
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24159 and previous config saved to /var/cache/conftool/dbconfig/20220406-135132-marostegui.json
  • 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2126.codfw.wmnet with reason: Rebooting for T303174
  • 13:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2126.codfw.wmnet with reason: Rebooting for T303174
  • 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2125.codfw.wmnet with reason: Rebooting for T303174
  • 13:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2125.codfw.wmnet with reason: Rebooting for T303174
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:36 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:36 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:34 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4021.ulsfo.wmnet with reason: host reimage
  • 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2107.codfw.wmnet with reason: Rebooting for T303174
  • 13:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2107.codfw.wmnet with reason: Rebooting for T303174
  • 13:31 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4021.ulsfo.wmnet with reason: host reimage
  • 13:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:29 kartik@deploy1002: Synchronized php-1.39.0-wmf.6/extensions/Translate/tag/PageTranslationHooks.php: Backport: Revert "PageTranslationHooks: Don't kick in during interface message parsing" (T305531) (duration: 00m 57s)
  • 13:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:26 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2095.codfw.wmnet with reason: Rebooting for T303174
  • 13:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2095.codfw.wmnet with reason: Rebooting for T303174
  • 13:20 kartik@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Start writing to $wmgUsingKubernetes the same value as to $wmfUsingKubernetes (T45956) (duration: 00m 55s)
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2094.codfw.wmnet with reason: Rebooting for T303174
  • 13:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2094.codfw.wmnet with reason: Rebooting for T303174
  • 13:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
  • 13:15 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:15 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
  • 13:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
  • 13:11 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:11 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:11 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:10 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:07 mmandere: depool cp4021 for reimage - T290005
  • 13:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24158 and previous config saved to /var/cache/conftool/dbconfig/20220406-125117-marostegui.json
  • 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:45 mmandere: pool cp4033 with HAProxy as TLS termination layer - T290005
  • 12:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T300775)', diff saved to https://phabricator.wikimedia.org/P24157 and previous config saved to /var/cache/conftool/dbconfig/20220406-123603-marostegui.json
  • 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1163', diff saved to https://phabricator.wikimedia.org/P24156 and previous config saved to /var/cache/conftool/dbconfig/20220406-123505-root.json
  • 12:35 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24152 and previous config saved to /var/cache/conftool/dbconfig/20220406-121222-ladsgroup.json
  • 12:11 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 12:10 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:09 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24151 and previous config saved to /var/cache/conftool/dbconfig/20220406-115717-ladsgroup.json
  • 11:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4033.ulsfo.wmnet with reason: host reimage
  • 11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 11:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4033.ulsfo.wmnet with reason: host reimage
  • 11:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 11:37 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:35 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:32 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4033.ulsfo.wmnet with OS buster
  • 11:32 moritzm: installing wavpack security updates
  • 11:24 mmandere: depool cp4033 for reimage - T290005
  • 11:23 marostegui: dbmaint s3@eqiad T297189
  • 11:23 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 11:22 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:20 mmandere: pool cp4027 with HAProxy as TLS termination layer - T290005
  • 11:12 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4027.ulsfo.wmnet with OS buster
  • 11:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:03 mmandere: pool cp3052 with HAProxy as TLS termination layer - T290005
  • 11:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3052.esams.wmnet with OS buster
  • 11:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:57 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24150 and previous config saved to /var/cache/conftool/dbconfig/20220406-103929-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:32 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
  • 10:30 jynus: reruning es4 dump on backup2002
  • 10:29 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4027.ulsfo.wmnet with reason: host reimage
  • 10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:28 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
  • 10:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host deploy2002.codfw.wmnet
  • 10:25 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4027.ulsfo.wmnet with reason: host reimage
  • 10:24 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 10:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
  • 10:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:10 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4027.ulsfo.wmnet with OS buster
  • 10:07 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 10:03 mmandere: depool cp4027 for reimage - T290005
  • 10:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 09:58 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS buster
  • 09:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui1001.eqiad.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host xhgui1001.eqiad.wmnet
  • 09:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 09:51 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 09:50 mmandere: depool cp3052 for reimage - T290005
  • 09:47 moritzm: installing mariadb-10.3 updates from buster 10.12 point released (different from wmf-mariadb packages)
  • 09:44 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:24 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host webperf1003.eqiad.wmnet
  • 09:19 btullis@cumin1001: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto analytics cluster: Reboot Presto nodes
  • 09:17 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:17 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:15 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:15 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1001.eqiad.wmnet
  • 09:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet with reason: Rebooting for T303174
  • 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet with reason: Rebooting for T303174
  • 09:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es[2023-2025].codfw.wmnet with reason: Rebooting es2023 T303174
  • 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es[2023-2025].codfw.wmnet with reason: Rebooting es2023 T303174
  • 09:08 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24148 and previous config saved to /var/cache/conftool/dbconfig/20220406-090449-ladsgroup.json
  • 09:04 arturo: force-started update-openstack-mirror.service on mirror1001 for python3-eventlet (T305157)
  • 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1001.eqiad.wmnet
  • 09:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1002.eqiad.wmnet
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 08:38 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 08:35 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:35 ayounsi@cumin2002: START - Cookbook sre.network.cf
  • 08:34 jnuche@deploy1002: deploy-promote aborted: (duration: 00m 40s)
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24146 and previous config saved to /var/cache/conftool/dbconfig/20220406-083439-ladsgroup.json
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host webperf1004.eqiad.wmnet
  • 08:27 mmandere: pool cp4035 with HAProxy as TLS termination layer - T290005
  • 08:23 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:21 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cp5001.eqsin.wmnet with OS buster
  • 08:20 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4035.ulsfo.wmnet with OS buster
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24145 and previous config saved to /var/cache/conftool/dbconfig/20220406-081934-ladsgroup.json
  • 08:18 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:10 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Add zh-hans and zh-hant translation of Module and Module_talk aliases" (T286291 T298308 T165593 T286105) (duration: 00m 56s)
  • 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:07 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host webperf2004.codfw.wmnet
  • 07:56 kharlan@deploy1002: Synchronized wmf-config: Config: GrowthExperiments: Add mailing list question for eswiki (T303240 T305015) (duration: 00m 56s)
  • 07:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4035.ulsfo.wmnet with reason: host reimage
  • 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:43 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4035.ulsfo.wmnet with reason: host reimage
  • 07:40 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5001.eqsin.wmnet with reason: host reimage
  • 07:38 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:38 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host webperf2004.codfw.wmnet
  • 07:36 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5001.eqsin.wmnet with reason: host reimage
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host webperf2003.codfw.wmnet
  • 07:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2002.wikimedia.org
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
  • 07:28 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4035.ulsfo.wmnet with OS buster
  • 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
  • 07:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:21 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:20 mmandere: depool cp4035 for reimage - T290005
  • 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:12 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5001.eqsin.wmnet with OS buster
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:04 mmandere: depool cp5001 for reimage - T290005
  • 07:03 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:03 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host webperf2003.codfw.wmnet
  • 06:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24144 and previous config saved to /var/cache/conftool/dbconfig/20220406-064633-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 05:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 05:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 03:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 03:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:59 ejegg: updated civicrm from 87bc3114 to 7b7b284d
  • 02:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 02:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 02:38 cstone: payments-wiki revision changed from 6f888c28 to 4e42d75f
  • 01:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24143 and previous config saved to /var/cache/conftool/dbconfig/20220406-014925-ladsgroup.json
  • 01:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24142 and previous config saved to /var/cache/conftool/dbconfig/20220406-013420-ladsgroup.json
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24141 and previous config saved to /var/cache/conftool/dbconfig/20220406-011915-ladsgroup.json
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24140 and previous config saved to /var/cache/conftool/dbconfig/20220406-010410-ladsgroup.json

2022-04-05

  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24139 and previous config saved to /var/cache/conftool/dbconfig/20220405-233042-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 22:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24138 and previous config saved to /var/cache/conftool/dbconfig/20220405-224352-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24137 and previous config saved to /var/cache/conftool/dbconfig/20220405-222847-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24136 and previous config saved to /var/cache/conftool/dbconfig/20220405-221342-ladsgroup.json
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24135 and previous config saved to /var/cache/conftool/dbconfig/20220405-215837-ladsgroup.json
  • 21:21 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fd8b410] (duration: 06m 48s)
  • 21:14 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fd8b410]
  • 21:14 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410] (thin): Regular analytics weekly train THIN [analytics/refinery@fd8b410] (duration: 00m 10s)
  • 21:14 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410] (thin): Regular analytics weekly train THIN [analytics/refinery@fd8b410]
  • 21:13 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410]: Regular analytics weekly train [analytics/refinery@fd8b410] (duration: 22m 50s)
  • 21:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6014.drmrs.wmnet
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24133 and previous config saved to /var/cache/conftool/dbconfig/20220405-205822-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 20:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6014.drmrs.wmnet
  • 20:50 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410]: Regular analytics weekly train [analytics/refinery@fd8b410]
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 urbanecm: UTC late B&C window done
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mutante: puppetmaster1001 - running test downloads of geoip databases to a temp dir
  • 20:47 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 8ea8634: Change upload dialog automatic upload comments (T305303) (duration: 00m 54s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:41 razzi: deploying refinery for https://gerrit.wikimedia.org/r/c/analytics/refinery/+/776269/
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6013.drmrs.wmnet
  • 20:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 10c16c5: [config]: Undeploy GDI survey from EN,FR and ES wikis in PROD (T303962) (duration: 00m 55s)
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6013.drmrs.wmnet
  • 20:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6012.drmrs.wmnet
  • 20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24132 and previous config saved to /var/cache/conftool/dbconfig/20220405-201315-ladsgroup.json
  • 20:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6012.drmrs.wmnet
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24131 and previous config saved to /var/cache/conftool/dbconfig/20220405-195810-ladsgroup.json
  • 19:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6011.drmrs.wmnet
  • 19:49 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw(1307|1308|1309|1310|1311|1318|1334|1335|1336|1337).*
  • 19:46 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6011.drmrs.wmnet
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24130 and previous config saved to /var/cache/conftool/dbconfig/20220405-194305-ladsgroup.json
  • 19:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6010.drmrs.wmnet
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6010.drmrs.wmnet
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24129 and previous config saved to /var/cache/conftool/dbconfig/20220405-192800-ladsgroup.json
  • 19:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6009.drmrs.wmnet
  • 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6009.drmrs.wmnet
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6006.drmrs.wmnet
  • 18:47 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6006.drmrs.wmnet
  • 18:42 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 18:41 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 18:37 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 18:34 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.1-1+deb11u1_amd64.changes
  • 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6005.drmrs.wmnet
  • 18:28 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.1-1_amd64.changes # T299705
  • 18:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2015.codfw.wmnet
  • 18:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2016.codfw.wmnet
  • 18:25 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2017.codfw.wmnet
  • 18:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2018.codfw.wmnet
  • 18:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6005.drmrs.wmnet
  • 18:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2019.codfw.wmnet
  • 18:23 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2015.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2020.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2016.codfw.wmnet
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24128 and previous config saved to /var/cache/conftool/dbconfig/20220405-181712-ladsgroup.json
  • 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24127 and previous config saved to /var/cache/conftool/dbconfig/20220405-181658-ladsgroup.json
  • 18:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6004.drmrs.wmnet
  • 18:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6004.drmrs.wmnet
  • 18:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24126 and previous config saved to /var/cache/conftool/dbconfig/20220405-180153-ladsgroup.json
  • 18:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 17:59 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parse2020.codfw.wmnet
  • 17:59 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 17:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2020.codfw.wmnet
  • 17:58 mutante: rebooting hosts in the parse201* range, starting with parse2019, counting down
  • 17:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6003.drmrs.wmnet
  • 17:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
  • 17:56 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 17:54 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host parse2020.codfw.wmnet
  • 17:53 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 17:52 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse201[7-9].codfw.wmnet
  • 17:51 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse201[7-9].wmnet
  • 17:51 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2020.wmnet
  • 17:49 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6003.drmrs.wmnet
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4035.ulsfo.wmnet
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6002.drmrs.wmnet
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24125 and previous config saved to /var/cache/conftool/dbconfig/20220405-174648-ladsgroup.json
  • 17:40 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6002.drmrs.wmnet
  • 17:40 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4035.ulsfo.wmnet
  • 17:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4033.ulsfo.wmnet
  • 17:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1040.eqiad.wmnet
  • 17:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1041.eqiad.wmnet
  • 17:32 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1040.eqiad.wmnet
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24124 and previous config saved to /var/cache/conftool/dbconfig/20220405-173143-ladsgroup.json
  • 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6001.drmrs.wmnet
  • 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4033.ulsfo.wmnet
  • 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4021.ulsfo.wmnet
  • 17:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1042.eqiad.wmnet
  • 17:28 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1041.eqiad.wmnet
  • 17:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6001.drmrs.wmnet
  • 17:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1043.eqiad.wmnet
  • 17:23 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1042.eqiad.wmnet
  • 17:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1146.eqiad.wmnet with OS buster
  • 17:22 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4021.ulsfo.wmnet
  • 17:21 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1044.eqiad.wmnet
  • 17:21 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 17:18 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1043.eqiad.wmnet
  • 17:17 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1045.eqiad.wmnet
  • 17:16 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1044.eqiad.wmnet
  • 17:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:13 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
  • 17:12 mutante: serially rebooting hosts in the wtp104* range
  • 17:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:09 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1045.eqiad.wmnet
  • 17:08 mutante: wtp1046 - rebooting
  • 17:06 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1007.eqiad.wmnet
  • 17:06 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1007.eqiad.wmnet
  • 17:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1007.eqiad.wmnet with OS bullseye
  • 17:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
  • 17:05 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 16:54 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
  • 16:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 16:51 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 16:49 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 16:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:48 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 16:43 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:43 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 16:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:41 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:38 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 16:36 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1007.eqiad.wmnet with OS bullseye
  • 16:35 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:35 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24123 and previous config saved to /var/cache/conftool/dbconfig/20220405-163454-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 16:32 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Upgrade dbstore1007 to bullseye
  • 16:32 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Upgrade dbstore1007 to bullseye
  • 16:32 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1005.eqiad.wmnet
  • 16:32 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1005.eqiad.wmnet
  • 16:19 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1005.eqiad.wmnet with OS bullseye
  • 16:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1090.eqiad.wmnet
  • 16:08 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:08 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:07 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:07 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1005.eqiad.wmnet with reason: host reimage
  • 16:02 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1005.eqiad.wmnet with reason: host reimage
  • 16:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 16:01 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1090.eqiad.wmnet
  • 15:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 15:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 15:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:52 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1005.eqiad.wmnet with OS bullseye
  • 15:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 15:49 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Upgrade dbstore1005 to bullseye
  • 15:49 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Upgrade dbstore1005 to bullseye
  • 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:47 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1003.eqiad.wmnet
  • 15:47 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1003.eqiad.wmnet
  • 15:46 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 15:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2024.codfw.wmnet with reason: Rebooting for T303174
  • 15:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2024.codfw.wmnet with reason: Rebooting for T303174
  • 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 15:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 15:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 15:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 15:42 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.5/includes: Backport: ParserOutputAccess: Allow calling getPO with option of not saving in PC (T285993) (duration: 01m 00s)
  • 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1089.eqiad.wmnet
  • 15:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:40 moritzm: drain ganeti2019 T305469
  • 15:39 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1003.eqiad.wmnet with OS bullseye
  • 15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
  • 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1088.eqiad.wmnet
  • 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 15:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 15:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 15:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4036.ulsfo.wmnet
  • 15:26 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:26 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:25 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1003.eqiad.wmnet with reason: host reimage
  • 15:23 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1088.eqiad.wmnet
  • 15:23 mmandere: pool cp5007 with HAProxy as TLS termination layer - T290005
  • 15:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet with reason: Rebooting for T303174
  • 15:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet with reason: Rebooting for T303174
  • 15:20 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1003.eqiad.wmnet with reason: host reimage
  • 15:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:19 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5007.eqsin.wmnet with OS buster
  • 15:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 15:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:12 moritzm: installing atftp security updates
  • 15:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2022.codfw.wmnet with reason: Rebooting for T303174
  • 15:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2022.codfw.wmnet with reason: Rebooting for T303174
  • 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3065.esams.wmnet
  • 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1087.eqiad.wmnet
  • 15:10 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1003.eqiad.wmnet with OS bullseye
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2020.codfw.wmnet with reason: Rebooting for T303174
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2020.codfw.wmnet with reason: Rebooting for T303174
  • 15:02 mmandere: pool cp5013 with HAProxy as TLS termination layer - T290005
  • 15:01 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Taking host offline to upgrade to Bullseye
  • 15:01 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Taking host offline to upgrade to Bullseye
  • 15:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5013.eqsin.wmnet with OS buster
  • 14:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2013.codfw.wmnet with reason: Rebooting for T303174
  • 14:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2013.codfw.wmnet with reason: Rebooting for T303174
  • 14:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3065.esams.wmnet
  • 14:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-cache1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1087.eqiad.wmnet
  • 14:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-cache1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4036.ulsfo.wmnet
  • 14:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5007.eqsin.wmnet with reason: host reimage
  • 14:49 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-cache1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2012.codfw.wmnet with reason: Rebooting for T303174
  • 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2012.codfw.wmnet with reason: Rebooting for T303174
  • 14:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5007.eqsin.wmnet with reason: host reimage
  • 14:44 vgutierrez: re-pool cp1086
  • 14:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2011.codfw.wmnet with reason: Rebooting for T303174
  • 14:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2011.codfw.wmnet with reason: Rebooting for T303174
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24122 and previous config saved to /var/cache/conftool/dbconfig/20220405-143316-ladsgroup.json
  • 14:31 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
  • 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster1001.eqiad.wmnet with reason: reimage
  • 14:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster1001.eqiad.wmnet with reason: reimage
  • 14:31 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
  • 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2014.codfw.wmnet with reason: Rebooting for T303174
  • 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2014.codfw.wmnet with reason: Rebooting for T303174
  • 14:22 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5007.eqsin.wmnet with OS buster
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24121 and previous config saved to /var/cache/conftool/dbconfig/20220405-141811-ladsgroup.json
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:12 mmandere: depool cp5007 for reimage - T290005
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5013.eqsin.wmnet with OS buster
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable videojs on all of DIP wikis (T248418) (duration: 00m 53s)
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24120 and previous config saved to /var/cache/conftool/dbconfig/20220405-140306-ladsgroup.json
  • 13:58 mmandere: depool cp5013 for reimage - T290005
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3004.wikimedia.org
  • 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24119 and previous config saved to /var/cache/conftool/dbconfig/20220405-134801-ladsgroup.json
  • 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deneb.codfw.wmnet
  • 13:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1086.eqiad.wmnet
  • 13:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3004.wikimedia.org
  • 13:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Cluster re-init for new IP ranges
  • 13:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Cluster re-init for new IP ranges
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deneb.codfw.wmnet
  • 13:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1086.eqiad.wmnet
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster2001.codfw.wmnet with reason: reimage
  • 13:23 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster2001.codfw.wmnet with reason: reimage
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4003.wikimedia.org
  • 13:20 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Start writing to $wmgUdp2logDest the same value as to $wmfUdp2logDest (T45956) (duration: 00m 54s)
  • 13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4003.wikimedia.org
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:17 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Pin CheckUser actor migration to old schema (T233004) (duration: 00m 54s)
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4030.ulsfo.wmnet
  • 13:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4030.ulsfo.wmnet
  • 13:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3064.esams.wmnet
  • 13:03 moritzm: installing openssl updates from buster 10.12 point release
  • 13:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
  • 12:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
  • 12:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:54 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3064.esams.wmnet
  • 12:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1085.eqiad.wmnet
  • 12:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24117 and previous config saved to /var/cache/conftool/dbconfig/20220405-124745-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24116 and previous config saved to /var/cache/conftool/dbconfig/20220405-124732-ladsgroup.json
  • 12:46 mmandere: pool cp6007 with HAProxy as TLS termination layer - T290005
  • 12:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1085.eqiad.wmnet
  • 12:40 mmandere: pool cp5015 with HAProxy as TLS termination layer - T290005
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24115 and previous config saved to /var/cache/conftool/dbconfig/20220405-123227-ladsgroup.json
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:22 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS buster
  • 12:18 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24114 and previous config saved to /var/cache/conftool/dbconfig/20220405-121722-ladsgroup.json
  • 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5015.eqsin.wmnet with OS buster
  • 11:56 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
  • 11:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:52 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.6 refs T305212
  • 11:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
  • 11:48 jnuche@deploy1002: Finished scap: resync wmf.6 to reapply security patches - T305212 (duration: 02m 50s)
  • 11:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
  • 11:45 jnuche@deploy1002: Started scap: resync wmf.6 to reapply security patches - T305212
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 T305427', diff saved to https://phabricator.wikimedia.org/P24112 and previous config saved to /var/cache/conftool/dbconfig/20220405-113944-root.json
  • 11:38 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS buster
  • 11:31 mmandere: depool cp6007 for reimage - T290005
  • 11:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:23 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5015.eqsin.wmnet with OS buster
  • 11:15 mmandere: depool cp5015 for reimage - T290005
  • 11:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:10 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 11:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 11:06 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudgw1001.eqiad.wmnet
  • 11:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24111 and previous config saved to /var/cache/conftool/dbconfig/20220405-110232-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24110 and previous config saved to /var/cache/conftool/dbconfig/20220405-110224-ladsgroup.json
  • 11:03 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:56 volans: installer spicerack v2.4.0 on the cumin hosts
  • 10:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24109 and previous config saved to /var/cache/conftool/dbconfig/20220405-104719-ladsgroup.json
  • 10:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 10:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24108 and previous config saved to /var/cache/conftool/dbconfig/20220405-103214-ladsgroup.json
  • 10:30 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:30 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 10:30 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 10:19 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:18 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24107 and previous config saved to /var/cache/conftool/dbconfig/20220405-101709-ladsgroup.json
  • 09:49 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24105 and previous config saved to /var/cache/conftool/dbconfig/20220405-091947-ladsgroup.json
  • 09:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24104 and previous config saved to /var/cache/conftool/dbconfig/20220405-091939-ladsgroup.json
  • 09:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.6"
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24103 and previous config saved to /var/cache/conftool/dbconfig/20220405-090434-ladsgroup.json
  • 08:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24102 and previous config saved to /var/cache/conftool/dbconfig/20220405-084928-ladsgroup.json
  • 08:49 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:46 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:35 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24101 and previous config saved to /var/cache/conftool/dbconfig/20220405-083423-ladsgroup.json
  • 08:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:31 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.6 refs T305212
  • 08:28 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:26 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dragonfly-supernode2001.codfw.wmnet
  • 08:23 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:21 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.6 refs T305212 (duration: 42m 53s)
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
  • 08:13 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:13 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:52 XioNoX: disable BGP to Tata in drmrs for circuit move - T298208
  • 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:38 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.6 refs T305212
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24100 and previous config saved to /var/cache/conftool/dbconfig/20220405-073617-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24099 and previous config saved to /var/cache/conftool/dbconfig/20220405-073608-ladsgroup.json
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24098 and previous config saved to /var/cache/conftool/dbconfig/20220405-072103-ladsgroup.json
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24097 and previous config saved to /var/cache/conftool/dbconfig/20220405-070558-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24096 and previous config saved to /var/cache/conftool/dbconfig/20220405-065053-ladsgroup.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1132 T301879', diff saved to https://phabricator.wikimedia.org/P24095 and previous config saved to /var/cache/conftool/dbconfig/20220405-063648-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 into API for testing T301879', diff saved to https://phabricator.wikimedia.org/P24094 and previous config saved to /var/cache/conftool/dbconfig/20220405-060124-marostegui.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 for testing T301879', diff saved to https://phabricator.wikimedia.org/P24093 and previous config saved to /var/cache/conftool/dbconfig/20220405-055256-marostegui.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24092 and previous config saved to /var/cache/conftool/dbconfig/20220405-054610-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24091 and previous config saved to /var/cache/conftool/dbconfig/20220405-054602-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24090 and previous config saved to /var/cache/conftool/dbconfig/20220405-053057-ladsgroup.json
  • 05:17 _joe_: uploading new minor version of conftool to apt for buster/bullseye (requestctl new feature)
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24089 and previous config saved to /var/cache/conftool/dbconfig/20220405-051552-ladsgroup.json
  • 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24088 and previous config saved to /var/cache/conftool/dbconfig/20220405-050047-ladsgroup.json
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 for testing T301879', diff saved to https://phabricator.wikimedia.org/P24087 and previous config saved to /var/cache/conftool/dbconfig/20220405-043426-marostegui.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24086 and previous config saved to /var/cache/conftool/dbconfig/20220405-040309-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 04:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24085 and previous config saved to /var/cache/conftool/dbconfig/20220405-040301-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24084 and previous config saved to /var/cache/conftool/dbconfig/20220405-034756-ladsgroup.json
  • 03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24083 and previous config saved to /var/cache/conftool/dbconfig/20220405-033251-ladsgroup.json
  • 03:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24082 and previous config saved to /var/cache/conftool/dbconfig/20220405-031745-ladsgroup.json
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24081 and previous config saved to /var/cache/conftool/dbconfig/20220405-022132-ladsgroup.json
  • 02:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24080 and previous config saved to /var/cache/conftool/dbconfig/20220405-022124-ladsgroup.json
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24079 and previous config saved to /var/cache/conftool/dbconfig/20220405-020619-ladsgroup.json
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: T305423
  • 01:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: T305423
  • 01:57 eileen: process control config revision changed from 06379640 to 25728a0e
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24078 and previous config saved to /var/cache/conftool/dbconfig/20220405-015114-ladsgroup.json
  • 01:47 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp5002.eqsin.wmnet
  • 01:42 eileen: civicrm revision changed from 84c737b6 to 87bc3114
  • 01:37 eileen: config revision changed from bb0e1af3 to 06379640
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24077 and previous config saved to /var/cache/conftool/dbconfig/20220405-013609-ladsgroup.json
  • 01:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet
  • 01:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet
  • 01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet
  • 01:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3063.esams.wmnet
  • 00:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4034.ulsfo.wmnet
  • 00:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5016.eqsin.wmnet
  • 00:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3063.esams.wmnet
  • 00:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1084.eqiad.wmnet
  • 00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4034.ulsfo.wmnet
  • 00:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2042.codfw.wmnet
  • 00:43 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5016.eqsin.wmnet
  • 00:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1084.eqiad.wmnet
  • 00:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2042.codfw.wmnet
  • 00:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4032.ulsfo.wmnet
  • 00:39 mutante: gitlab1001 - mv 1648814678_2022_04_01_14.9.1_gitlab_backup.tar and other files from April 2nd/April 3rd over from /srv/gitlab-backup to /mnt/gitlab-backup to prevent another outage due to disk space T274463
  • 00:36 mutante: gitlab2001 - apt-get clean to prevent disk space issues
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24076 and previous config saved to /var/cache/conftool/dbconfig/20220405-003419-ladsgroup.json
  • 00:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24075 and previous config saved to /var/cache/conftool/dbconfig/20220405-003405-ladsgroup.json
  • 00:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4032.ulsfo.wmnet
  • 00:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
  • 00:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1047.eqiad.wmnet
  • 00:32 mutante: gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover... T274463 - <+icinga-wm> RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK
  • 00:30 mutante: gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover...
  • 00:27 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1048.eqiad.wmnet
  • 00:23 mutante: wtp1046, wtp1047, wtp1048 - rebooting, one at a time
  • 00:21 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp104[6-8].eqiad.wmnet
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24074 and previous config saved to /var/cache/conftool/dbconfig/20220405-001900-ladsgroup.json
  • 00:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5012.eqsin.wmnet
  • 00:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3062.esams.wmnet
  • 00:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1083.eqiad.wmnet
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24073 and previous config saved to /var/cache/conftool/dbconfig/20220405-000355-ladsgroup.json

2022-04-04

  • 23:51 mutante: apt1001 - importing gitlab-runner package for bullseye via: 'sudo -E reprepro --noskipold --component thirdparty/gitlab-runner update bullseye-wikimedia' after gerrit:767604 (T297659)
  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24072 and previous config saved to /var/cache/conftool/dbconfig/20220404-234850-ladsgroup.json
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24071 and previous config saved to /var/cache/conftool/dbconfig/20220404-224836-ladsgroup.json
  • 22:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 22:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24070 and previous config saved to /var/cache/conftool/dbconfig/20220404-224828-ladsgroup.json
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24069 and previous config saved to /var/cache/conftool/dbconfig/20220404-223323-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24068 and previous config saved to /var/cache/conftool/dbconfig/20220404-221818-ladsgroup.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24067 and previous config saved to /var/cache/conftool/dbconfig/20220404-220313-ladsgroup.json
  • 21:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet
  • 21:14 mutante: puppetmaster1001/puppetmaster2003 - geoip / maxmind database update timers renamed. 'geoip_update_legacy' became 'geoip_update_main', 'geoip_update' became 'geoip_update_ipinfo'. Not using the confusing 'legacy' term anymore as was suggested as part of (T303464)
  • 21:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5011.eqsin.wmnet
  • 21:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2041.codfw.wmnet
  • 21:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet
  • 21:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5011.eqsin.wmnet
  • 21:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2041.codfw.wmnet
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24066 and previous config saved to /var/cache/conftool/dbconfig/20220404-205932-ladsgroup.json
  • 20:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 20:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24065 and previous config saved to /var/cache/conftool/dbconfig/20220404-205924-ladsgroup.json
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24064 and previous config saved to /var/cache/conftool/dbconfig/20220404-204419-ladsgroup.json
  • 20:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1081.eqiad.wmnet
  • 20:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5010.eqsin.wmnet
  • 20:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3061.esams.wmnet
  • 20:32 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1081.eqiad.wmnet
  • 20:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5010.eqsin.wmnet
  • 20:30 urbanecm: UTC late B&C window completed
  • 20:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3061.esams.wmnet
  • 20:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8c81de9: Remove wgWMEIPAddressCopyActionEnabled from Beta and production config (T296469) (duration: 00m 51s)
  • 20:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24063 and previous config saved to /var/cache/conftool/dbconfig/20220404-202914-ladsgroup.json
  • 20:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5006.eqsin.wmnet
  • 20:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1080.eqiad.wmnet
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:16 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5006.eqsin.wmnet
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4027.ulsfo.wmnet
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24062 and previous config saved to /var/cache/conftool/dbconfig/20220404-201409-ladsgroup.json
  • 20:11 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1080.eqiad.wmnet
  • 20:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3060.esams.wmnet
  • 20:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4027.ulsfo.wmnet
  • 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
  • 20:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp3060.esams.wmnet
  • 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
  • 19:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5005.eqsin.wmnet
  • 19:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
  • 19:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2040.codfw.wmnet
  • 19:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
  • 19:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2040.codfw.wmnet
  • 19:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1079.eqiad.wmnet
  • 19:38 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
  • 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1002.eqiad.wmnet
  • 19:35 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1002.eqiad.wmnet
  • 19:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2002.codfw.wmnet
  • 19:33 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon2002.codfw.wmnet
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1079.eqiad.wmnet
  • 19:22 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1001.eqiad.wmnet
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24061 and previous config saved to /var/cache/conftool/dbconfig/20220404-191750-ladsgroup.json
  • 19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24060 and previous config saved to /var/cache/conftool/dbconfig/20220404-191743-ladsgroup.json
  • 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-tls
  • 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-be
  • 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=varnish-fe
  • 19:16 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1001.eqiad.wmnet
  • 19:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4026.ulsfo.wmnet
  • 19:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3059.esams.wmnet
  • 19:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp5005.eqsin.wmnet
  • 19:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4026.ulsfo.wmnet
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24059 and previous config saved to /var/cache/conftool/dbconfig/20220404-190238-ladsgroup.json
  • 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3059.esams.wmnet
  • 18:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2039.codfw.wmnet
  • 18:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1078.eqiad.wmnet
  • 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
  • 18:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2039.codfw.wmnet
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24058 and previous config saved to /var/cache/conftool/dbconfig/20220404-184733-ladsgroup.json
  • 18:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3058.esams.wmnet
  • 18:46 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1078.eqiad.wmnet
  • 18:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4025.ulsfo.wmnet
  • 18:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5004.eqsin.wmnet
  • 18:39 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4025.ulsfo.wmnet
  • 18:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage2001.codfw.wmnet
  • 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3058.esams.wmnet
  • 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5004.eqsin.wmnet
  • 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1077.eqiad.wmnet
  • 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2038.codfw.wmnet
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24057 and previous config saved to /var/cache/conftool/dbconfig/20220404-183227-ladsgroup.json
  • 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4024.ulsfo.wmnet
  • 18:26 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2038.codfw.wmnet
  • 18:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage2001.codfw.wmnet
  • 18:25 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1077.eqiad.wmnet
  • 18:25 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4024.ulsfo.wmnet
  • 18:25 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage1001.eqiad.wmnet
  • 18:08 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
  • 17:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5001.eqsin.wmnet
  • 17:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 17:27 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24056 and previous config saved to /var/cache/conftool/dbconfig/20220404-172707-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24055 and previous config saved to /var/cache/conftool/dbconfig/20220404-172659-ladsgroup.json
  • 17:25 XioNoX: push urpf DHCP exception to all core routers with urpf configured - T285461
  • 17:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5001.eqsin.wmnet
  • 17:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2037.codfw.wmnet
  • 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2037.codfw.wmnet
  • 17:16 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1076.eqiad.wmnet
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24054 and previous config saved to /var/cache/conftool/dbconfig/20220404-171154-ladsgroup.json
  • 17:11 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:10 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1076.eqiad.wmnet
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24053 and previous config saved to /var/cache/conftool/dbconfig/20220404-165649-ladsgroup.json
  • 16:50 taavi: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Brand" "Brand/Archive" "Majavah" --reason 'phab:T305387' # T305387
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24052 and previous config saved to /var/cache/conftool/dbconfig/20220404-164144-ladsgroup.json
  • 16:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 16:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 16:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 16:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 16:09 volans: uploaded spicerack_2.4.0 to apt.wikimedia.org bullseye-wikimedia
  • 16:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:08 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 16:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:02 bblack@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
  • 16:00 bblack@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
  • 15:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 15:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 15:44 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24051 and previous config saved to /var/cache/conftool/dbconfig/20220404-153846-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24050 and previous config saved to /var/cache/conftool/dbconfig/20220404-153839-ladsgroup.json
  • 15:28 moritzm: remove stray debmonitor-server/cumin installs (cleanup of 548425b)
  • 15:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases1002.eqiad.wmnet
  • 15:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24049 and previous config saved to /var/cache/conftool/dbconfig/20220404-152333-ladsgroup.json
  • 15:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host releases1002.eqiad.wmnet
  • 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Use "unexpectedUnconnectedPage" page prop on Beta (production no-op) (duration: 00m 50s)
  • 15:17 mmandere: pool cp6015 with HAProxy as TLS termination layer - T290005
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24048 and previous config saved to /var/cache/conftool/dbconfig/20220404-150828-ladsgroup.json
  • 15:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
  • 15:05 mmandere: pool cp5008 with HAProxy as TLS termination layer - T290005
  • 15:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5008.eqsin.wmnet with OS buster
  • 14:55 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host alert1001.wikimedia.org
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24047 and previous config saved to /var/cache/conftool/dbconfig/20220404-145323-ladsgroup.json
  • 14:44 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 14:44 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org
  • 14:42 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 14:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 14:37 herron: rebooting alert2001
  • 14:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
  • 14:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases2002.codfw.wmnet
  • 14:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host releases2002.codfw.wmnet
  • 14:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
  • 14:16 mmandere: depool cp6015 for reimage - T290005
  • 14:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5008.eqsin.wmnet with OS buster
  • 14:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 13:58 mmandere: depool cp5008 for reimage - T290005
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24045 and previous config saved to /var/cache/conftool/dbconfig/20220404-135314-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24044 and previous config saved to /var/cache/conftool/dbconfig/20220404-135307-ladsgroup.json
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
  • 13:44 mmandere: pool cp3055 with HAProxy as TLS termination layer - T290005
  • 13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24043 and previous config saved to /var/cache/conftool/dbconfig/20220404-133801-ladsgroup.json
  • 13:35 mmandere: pool cp4022 with HAProxy as TLS termination layer - T290005
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5001.wikimedia.org
  • 13:34 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3055.esams.wmnet with OS buster
  • 13:31 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4022.ulsfo.wmnet with OS buster
  • 13:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5001.wikimedia.org
  • 13:26 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24042 and previous config saved to /var/cache/conftool/dbconfig/20220404-132256-ladsgroup.json
  • 13:20 urbanecm: UTC afternoon B&C window done
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 daniel@deploy1002: Synchronized multiversion/defines.php: Config: Always set MW_USE_CONFIG_SCHEMA. (T305176) (duration: 00m 50s)
  • 13:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24041 and previous config saved to /var/cache/conftool/dbconfig/20220404-130751-ladsgroup.json
  • 13:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:05 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 13:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7ebad8f: Add logo variants for zhwiki (T273578) (duration: 00m 51s)
  • 13:04 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
  • 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
  • 12:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
  • 12:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 12:48 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4022.ulsfo.wmnet with OS buster
  • 12:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 12:43 moritzm: installing gmp security updates
  • 12:42 mmandere: depool cp4022 for reimage - T290005
  • 12:38 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS buster
  • 12:35 ottomata: removing retention.ms override from eventstreams publicly exposed topics in kafka main-eqiad and main-codfw - T241178
  • 12:31 mmandere: depool cp3055 for reimage - T290005
  • 12:31 ottomata: deleting empty typo topics from kafka main-eqiad: eqiad.mediawiki.page-edit (found while working on T241178)
  • 12:26 ottomata: deleting empty typo topics from kafka main-codfw: codfw.mediawiki.page_delete, codfw.mediawiki.page_move, codfw.mediawiki.page_restore, codfw.mediawiki.revision_create, codfw.mediawiki.revision_visibility_set, codfw.mediawiki.user_block (found while working on T241178)
  • 12:18 moritzm: installing expat updates (followups to earlier security fixes, no security impact by itself)
  • 12:11 mmandere: pool cp4028 with HAProxy as TLS termination layer - T290005
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24040 and previous config saved to /var/cache/conftool/dbconfig/20220404-121030-ladsgroup.json
  • 12:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24039 and previous config saved to /var/cache/conftool/dbconfig/20220404-121022-ladsgroup.json
  • 12:05 mmandere: pool cp3054 with HAProxy as TLS termination layer - T290005
  • 12:04 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4028.ulsfo.wmnet with OS buster
  • 12:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3054.esams.wmnet with OS buster
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24038 and previous config saved to /var/cache/conftool/dbconfig/20220404-115516-ladsgroup.json
  • 11:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24037 and previous config saved to /var/cache/conftool/dbconfig/20220404-114011-ladsgroup.json
  • 11:39 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:37 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
  • 11:37 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
  • 11:34 moritzm: installing zziplib security updates
  • 11:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
  • 11:27 moritzm: installing jbig2dec security updates
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24036 and previous config saved to /var/cache/conftool/dbconfig/20220404-112506-ladsgroup.json
  • 11:20 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4028.ulsfo.wmnet with OS buster
  • 11:18 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:12 mmandere: depool cp4028 for reimage - T290005
  • 11:11 volans: deploying python3-wmflib 1.2.0 fleet-wide
  • 11:09 jforrester@deploy1002: Finished deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block (duration: 00m 08s)
  • 11:09 jforrester@deploy1002: Started deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block
  • 11:07 moritzm: installing cups security updates on buster (client side tools/libs)
  • 11:04 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS buster
  • 10:53 mmandere: depool cp3054 for reimage - T290005
  • 10:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1003.eqiad.wmnet
  • 10:38 volans: uploaded python3-wmflib_1.2.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 10:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1003.eqiad.wmnet
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24035 and previous config saved to /var/cache/conftool/dbconfig/20220404-102616-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24034 and previous config saved to /var/cache/conftool/dbconfig/20220404-102609-ladsgroup.json
  • 10:26 moritzm: installing libxml2 security updates
  • 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1004.eqiad.wmnet
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24033 and previous config saved to /var/cache/conftool/dbconfig/20220404-101104-ladsgroup.json
  • 10:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1004.eqiad.wmnet
  • 10:08 moritzm: installing icu bugfix updates from buster 10.12 point release
  • 09:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1005.eqiad.wmnet
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24032 and previous config saved to /var/cache/conftool/dbconfig/20220404-095558-ladsgroup.json
  • 09:55 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
  • 09:54 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1005.eqiad.wmnet
  • 09:51 mmandere: pool cp6008 with HAProxy as TLS termination layer - T290005
  • 09:48 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
  • 09:47 moritzm: installing zlib security updates
  • 09:44 mmandere: pool cp5003 with HAProxy as TLS termination layer - T290005
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24031 and previous config saved to /var/cache/conftool/dbconfig/20220404-094053-ladsgroup.json
  • 09:31 moritzm: rolling restart of FPM/Apache on mw canaries to pick up updated zlib/glibc/openssl/libxml
  • 09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 09:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 09:26 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS buster
  • 09:26 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 09:25 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5003.eqsin.wmnet with OS buster
  • 09:16 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 09:12 moritzm: installing openssl updates from Buster 10.12 point release
  • 09:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 08:59 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 08:59 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
  • 08:56 moritzm: installing glibc updates from buster 10.12 point release
  • 08:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P24030 and previous config saved to /var/cache/conftool/dbconfig/20220404-084523-root.json
  • 08:43 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:42 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
  • 08:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:37 moritzm: installing flac security updates
  • 08:37 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:37 mmandere: depool cp6008 for reimage - T290005
  • 08:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:31 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:31 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24029 and previous config saved to /var/cache/conftool/dbconfig/20220404-083031-ladsgroup.json
  • 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:28 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5003.eqsin.wmnet with OS buster
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:25 urbanecm@deploy1002: Synchronized logos/config.yaml: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (3/3) (duration: 00m 50s)
  • 08:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (2/3) (duration: 00m 50s)
  • 08:23 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (1/3) (duration: 00m 51s)
  • 08:19 mmandere: depool cp5003 for reimage - T290005
  • 08:02 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 14s)
  • 08:01 jayme@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 07:54 jayme: imported scap 4.6.0 to stretch-/buster-/bullseye-wikimedia - T305250
  • 07:44 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:43 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 07:43 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 07:43 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 07:42 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:39 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:23 taavi: UTC morning deployments done
  • 07:21 taavi@deploy1002: Synchronized wmf-config/throttle.php: Config: throttle: removed expired rule (T304836) (duration: 00m 49s)
  • 07:19 taavi@deploy1002: Synchronized static/images/mobile/copyright/: Config: Revert "fawiki: Set celebration logo for new vector" (T304314) (duration: 00m 49s)
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "fawiki: Set celebration logo for new vector" (T304314) (duration: 00m 50s)
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:15 taavi@deploy1002: Synchronized static/images/project-logos: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 50s)
  • 07:14 taavi@deploy1002: Synchronized logos/config.yaml: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 50s)
  • 07:13 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 51s)
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content and Section Translation for Persian Wikipedia (T296475) (duration: 00m 51s)
  • 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24027 and previous config saved to /var/cache/conftool/dbconfig/20220404-060542-ladsgroup.json
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24026 and previous config saved to /var/cache/conftool/dbconfig/20220404-055037-ladsgroup.json
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1130.eqiad.wmnet with OS bullseye
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24025 and previous config saved to /var/cache/conftool/dbconfig/20220404-053531-ladsgroup.json
  • 05:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
  • 05:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24024 and previous config saved to /var/cache/conftool/dbconfig/20220404-052026-ladsgroup.json
  • 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1130.eqiad.wmnet with OS bullseye
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24023 and previous config saved to /var/cache/conftool/dbconfig/20220404-041545-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 03:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 03:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 02:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 02:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance

2022-04-02

  • 11:26 akosiaris: disable zotero paging until T291707 is resolved.
  • 11:11 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 11:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync

2022-04-01

  • 23:25 mutante: DNS - new project language 'kcg'. 'Tyap is a regionally important dialect cluster of Plateau languages in Nigeria's Middle Belt, named after its prestige dialect. It is also known by its Hausa exonym as Katab or Kataf.' T305279
  • 23:08 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 23:08 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 22:04 bblack: esams re-pooled - T304089
  • 20:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:19 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:48 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
  • 19:47 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
  • 19:44 mutante: rebooting parsoid canary appservers - wtp1025, wtp1026, parse2001, parse2002
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].eqiad.wmnet
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=parse200[1-2].eqiad.wmnet
  • 19:37 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
  • 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
  • 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1450.eqiad.wmnet
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=varnish-fe
  • 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-tls
  • 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-be
  • 19:16 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
  • 19:16 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
  • 19:01 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
  • 19:00 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp2036.codfw.wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1414.wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw1414.wmnet
  • 18:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw141[4-8].wmnet
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2036.codfw.wmnet
  • 13:05 dcausse: reseting jvmquake flag on all wdqs hosts
  • 12:52 dcausse: restarting blazegraph on wdqs1006 and resetting jvmquake warning flag
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 11:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
  • 10:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 10:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 10:47 vgutierrez: reboot acme-chief instances to catch up on kernel upgrades
  • 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6002.drmrs.wmnet
  • 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6002.drmrs.wmnet
  • 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6001.drmrs.wmnet
  • 10:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6001.drmrs.wmnet
  • 10:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5002.eqsin.wmnet
  • 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5002.eqsin.wmnet
  • 10:06 vgutierrez: vgutierrez@puppetmaster2001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
  • 10:04 vgutierrez: vgutierrez@puppetmaster1001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
  • 10:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5001.eqsin.wmnet
  • 09:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5001.eqsin.wmnet
  • 09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4002.ulsfo.wmnet
  • 09:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4002.ulsfo.wmnet
  • 09:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4001.ulsfo.wmnet
  • 09:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4001.ulsfo.wmnet
  • 09:35 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir3002.esams.wmnet
  • 09:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3002.esams.wmnet
  • 09:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3001.esams.wmnet
  • 09:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3001.esams.wmnet
  • 09:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet
  • 09:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet
  • 09:10 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir2001.codfw.wmnet
  • 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2001.codfw.wmnet
  • 08:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1002.eqiad.wmnet
  • 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1002.eqiad.wmnet
  • 08:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1001.eqiad.wmnet
  • 08:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
  • 08:48 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ncredir1001.eqiad.wmnet
  • 08:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
  • 08:44 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 08:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:42 vgutierrez: rolling restart of ncredir instances to catch up on kernel upgrades
  • 06:54 XioNoX: traffic engineering in drmrs to prevent link saturation

Archives

See Server Admin Log/Archives.