You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech-static
Revision as of 00:37, 14 April 2022 by imported>Stashbot (ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance)
Jump to navigation Jump to search

2022-04-14

  • 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24625 and previous config saved to /var/cache/conftool/dbconfig/20220414-003750-ladsgroup.json
  • 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24624 and previous config saved to /var/cache/conftool/dbconfig/20220414-002245-ladsgroup.json
  • 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24623 and previous config saved to /var/cache/conftool/dbconfig/20220414-000740-ladsgroup.json

2022-04-13

  • 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24622 and previous config saved to /var/cache/conftool/dbconfig/20220413-235235-ladsgroup.json
  • 23:29 ejegg: updated payments-wiki from c4cab5b1 to c8fee00c
  • 22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24620 and previous config saved to /var/cache/conftool/dbconfig/20220413-225612-ladsgroup.json
  • 22:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 22:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 22:31 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1021.eqiad.wmnet with OS bullseye
  • 22:30 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1020.eqiad.wmnet with OS bullseye
  • 22:15 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1019.eqiad.wmnet with OS bullseye
  • 22:12 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1018.eqiad.wmnet with OS bullseye
  • 22:08 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1020.eqiad.wmnet with reason: host reimage
  • 22:07 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1021.eqiad.wmnet with OS bullseye
  • 22:06 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
  • 22:06 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
  • 22:05 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1020.eqiad.wmnet with reason: host reimage
  • 22:04 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
  • 22:04 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: Upgrade to bullseye
  • 22:03 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1019.eqiad.wmnet with reason: host reimage
  • 22:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 22:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 21:59 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1019.eqiad.wmnet with reason: host reimage
  • 21:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage
  • 21:54 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1020.eqiad.wmnet with OS bullseye
  • 21:53 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage
  • 21:51 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Upgrade to bullseye
  • 21:51 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Upgrade to bullseye
  • 21:48 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1019.eqiad.wmnet with OS bullseye
  • 21:47 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1017.eqiad.wmnet with OS bullseye
  • 21:47 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1019.eqiad.wmnet with reason: Upgrade to bullseye
  • 21:47 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1019.eqiad.wmnet with reason: Upgrade to bullseye
  • 21:42 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1018.eqiad.wmnet with OS bullseye
  • 21:41 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1018.eqiad.wmnet with reason: Upgrade to bullseye
  • 21:41 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1018.eqiad.wmnet with reason: Upgrade to bullseye
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:32 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1017.eqiad.wmnet with reason: host reimage
  • 21:30 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.7/maintenance/migrateLinksTable.php: Backport: MigrateLinksTable: Avoid dynamic loading of list columns to select (T299424) (duration: 00m 55s)
  • 21:29 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1017.eqiad.wmnet with reason: host reimage
  • 21:18 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1017.eqiad.wmnet with OS bullseye
  • 21:16 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Upgrade to bullseye
  • 21:16 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Upgrade to bullseye
  • 21:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 21:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24618 and previous config saved to /var/cache/conftool/dbconfig/20220413-211546-ladsgroup.json
  • 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24617 and previous config saved to /var/cache/conftool/dbconfig/20220413-210041-ladsgroup.json
  • 20:52 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1016.eqiad.wmnet with OS bullseye
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24616 and previous config saved to /var/cache/conftool/dbconfig/20220413-204535-ladsgroup.json
  • 20:36 razzi@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on clouddb1016.eqiad.wmnet with reason: host reimage
  • 20:34 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1016.eqiad.wmnet with reason: host reimage
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24615 and previous config saved to /var/cache/conftool/dbconfig/20220413-203030-ladsgroup.json
  • 20:23 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1016.eqiad.wmnet with OS bullseye
  • 20:17 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Upgrade to bullseye
  • 20:17 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Upgrade to bullseye
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dfe0b9c: fawiki: Change logo for 900K milestone (T306030; 2/2) (duration: 00m 54s)
  • 20:12 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-fa-900K.svg: dfe0b9c: fawiki: Change logo for 900K milestone (T306030; 1/2) (duration: 00m 56s)
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 076e6ef: Optimize logo for Wikispecies (T306037; 2/2) (duration: 00m 53s)
  • 20:09 urbanecm@deploy1002: Synchronized static/images/project-logos/: 076e6ef: Optimize logo for Wikispecies (T306037; 1/2) (duration: 00m 55s)
  • 19:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24614 and previous config saved to /var/cache/conftool/dbconfig/20220413-193250-ladsgroup.json
  • 19:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24613 and previous config saved to /var/cache/conftool/dbconfig/20220413-193236-ladsgroup.json
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24612 and previous config saved to /var/cache/conftool/dbconfig/20220413-191731-ladsgroup.json
  • 19:15 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1015.eqiad.wmnet with OS bullseye
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24611 and previous config saved to /var/cache/conftool/dbconfig/20220413-190226-ladsgroup.json
  • 18:51 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1015.eqiad.wmnet with reason: host reimage
  • 18:48 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1015.eqiad.wmnet with reason: host reimage
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24610 and previous config saved to /var/cache/conftool/dbconfig/20220413-184721-ladsgroup.json
  • 18:36 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1015.eqiad.wmnet with OS bullseye
  • 18:34 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1015.eqiad.wmnet with reason: Upgrade to bullseye
  • 18:33 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1015.eqiad.wmnet with reason: Upgrade to bullseye
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:21 dancy@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.7 refs T305213 (duration: 00m 56s)
  • 18:20 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.7 refs T305213
  • 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:17 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.7 refs T305213
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24609 and previous config saved to /var/cache/conftool/dbconfig/20220413-175430-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24608 and previous config saved to /var/cache/conftool/dbconfig/20220413-175422-ladsgroup.json
  • 17:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24607 and previous config saved to /var/cache/conftool/dbconfig/20220413-173917-ladsgroup.json
  • 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24606 and previous config saved to /var/cache/conftool/dbconfig/20220413-172412-ladsgroup.json
  • 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:10 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining Wikidata clients of s7 (with --batch-size 250).
  • 17:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24605 and previous config saved to /var/cache/conftool/dbconfig/20220413-170907-ladsgroup.json
  • 17:06 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining Wikidata clients of s5 (with --batch-size 250).
  • 17:03 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining Wikidata clients of s3 (with --batch-size 250).
  • 16:50 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all Wikidata clients of s2 (with --batch-size 250).
  • 16:48 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-launcher1002.eqiad.wmnet
  • 16:40 razzi: reboot an-launcher1002 for security updates
  • 16:39 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-launcher1002.eqiad.wmnet
  • 16:26 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of ruwikinews (for 5M pages each).
  • 16:20 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of metawiki (for 5M pages each).
  • 16:18 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of viwiki (for 5M pages each).
  • 16:13 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of cebwiki (for 5M pages each).
  • 16:13 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T306045 (duration: 00m 55s)
  • 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24604 and previous config saved to /var/cache/conftool/dbconfig/20220413-161245-ladsgroup.json
  • 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 16:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:06 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of ruwiki (for 5M pages each).
  • 16:04 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of jawiki (for 5M pages each).
  • 16:02 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1308.eqiad.wmnet
  • 16:01 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of frwiki (for 5M pages each).
  • 15:52 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of wikidatawiki (for 5M pages each).
  • 15:51 hoo: Ran extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of enwiki (for 5M pages each).
  • 15:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubestage2002.codfw.wmnet with reason: moving to a different rack
  • 15:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubestage2002.codfw.wmnet with reason: moving to a different rack
  • 15:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mc2023.codfw.wmnet with reason: moving to a different rack
  • 15:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mc2023.codfw.wmnet with reason: moving to a different rack
  • 15:45 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudstore1010.wikimedia.org with OS bullseye
  • 15:37 otto@deploy1002: Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 03s)
  • 15:37 otto@deploy1002: Started deploy [airflow-dags/research@b029f10]: (no justification provided)
  • 15:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudstore1010.wikimedia.org with OS bullseye
  • 15:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24603 and previous config saved to /var/cache/conftool/dbconfig/20220413-152504-ladsgroup.json
  • 15:23 otto@deploy1002: Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 10s)
  • 15:23 otto@deploy1002: Started deploy [airflow-dags/research@b029f10]: (no justification provided)
  • 15:11 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 110000001
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24602 and previous config saved to /var/cache/conftool/dbconfig/20220413-150959-ladsgroup.json
  • 15:07 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 100000001 --last-page-id 110000000
  • 15:03 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 90000001 --last-page-id 100000000
  • 14:58 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 80000001 --last-page-id 90000000
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24601 and previous config saved to /var/cache/conftool/dbconfig/20220413-145453-ladsgroup.json
  • 14:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 70000001 --last-page-id 80000000
  • 14:50 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 60000001 --last-page-id 70000000
  • 14:46 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 50000001 --last-page-id 60000000
  • 14:41 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 40000001 --last-page-id 50000000
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24600 and previous config saved to /var/cache/conftool/dbconfig/20220413-143948-ladsgroup.json
  • 14:36 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --batch-size 500 --first-page-id 30000001 --last-page-id 40000000
  • 14:31 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --first-page-id 20000001 --last-page-id 30000000
  • 14:31 jynus: bacula restarts finished
  • 14:31 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --first-page-id 10000001 --last-page-id 20000000
  • 14:27 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
  • 14:24 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
  • 14:23 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php commonswiki --last-page-id 10000000
  • 14:23 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
  • 14:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
  • 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Use "unexpectedUnconnectedPage" page prop on wikidataclient-test (duration: 00m 55s)
  • 14:07 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ foreachwikiindblist wikidataclient-test extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php
  • 14:06 otto@deploy1002: Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 04s)
  • 14:06 otto@deploy1002: Started deploy [airflow-dags/research@b029f10]: (no justification provided)
  • 14:05 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php testwiki
  • 13:58 jynus: restarting bacula hosts
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24599 and previous config saved to /var/cache/conftool/dbconfig/20220413-134613-ladsgroup.json
  • 13:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 13:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24598 and previous config saved to /var/cache/conftool/dbconfig/20220413-134605-ladsgroup.json
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:34 milimetric@deploy1002: Finished deploy [analytics/refinery@34be9f3] (thin): Regular analytics weekly train THIN [analytics/refinery@34be9f3] (duration: 00m 07s)
  • 13:34 milimetric@deploy1002: Started deploy [analytics/refinery@34be9f3] (thin): Regular analytics weekly train THIN [analytics/refinery@34be9f3]
  • 13:33 volans: installed spicerack v2.4.1 on cumin1001
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24597 and previous config saved to /var/cache/conftool/dbconfig/20220413-133100-ladsgroup.json
  • 13:30 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set templatelinks migration schema to write both in s4 - T299421 (duration: 00m 55s)
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:27 reedy@deploy1002: Synchronized wmf-config/: Migrate $wmfUdp2logDest to $wmgUdp2logDest - T45956 (duration: 00m 55s)
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:24 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T304438 (duration: 01m 03s)
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:19 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
  • 13:16 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use namespaced GerritExtDistProvider (duration: 00m 55s)
  • 13:16 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24596 and previous config saved to /var/cache/conftool/dbconfig/20220413-131555-ladsgroup.json
  • 13:15 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin1001 - T301955
  • 13:14 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin1001 - T301955
  • 13:13 otto@deploy1002: Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 34s)
  • 13:13 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
  • 13:13 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge testing - bking@cumin2002 - T301955
  • 13:13 otto@deploy1002: Started deploy [airflow-dags/research@b029f10]: (no justification provided)
  • 13:10 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest[1001-1002].eqiad.wmnet with reason: testing spicerack
  • 13:10 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on sretest[1001-1002].eqiad.wmnet with reason: testing spicerack
  • 13:04 volans: installed spicerack v2.4.1 on cumin2002
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24595 and previous config saved to /var/cache/conftool/dbconfig/20220413-130050-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24594 and previous config saved to /var/cache/conftool/dbconfig/20220413-120704-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24593 and previous config saved to /var/cache/conftool/dbconfig/20220413-120656-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24592 and previous config saved to /var/cache/conftool/dbconfig/20220413-115151-ladsgroup.json
  • 11:46 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
  • 11:40 topranks: Remove IPv6 router-advertisement config for fxp0 management interface on cr1-drmrs.
  • 11:38 gmodena@deploy1002: Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 00m 07s)
  • 11:38 gmodena@deploy1002: Started deploy [airflow-dags/research@b029f10]: (no justification provided)
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24591 and previous config saved to /var/cache/conftool/dbconfig/20220413-113645-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24590 and previous config saved to /var/cache/conftool/dbconfig/20220413-112140-ladsgroup.json
  • 10:46 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 10:46 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 10:42 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 10:41 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 10:40 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:40 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24589 and previous config saved to /var/cache/conftool/dbconfig/20220413-102904-ladsgroup.json
  • 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24588 and previous config saved to /var/cache/conftool/dbconfig/20220413-102856-ladsgroup.json
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24587 and previous config saved to /var/cache/conftool/dbconfig/20220413-101351-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24586 and previous config saved to /var/cache/conftool/dbconfig/20220413-095846-ladsgroup.json
  • 09:44 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24585 and previous config saved to /var/cache/conftool/dbconfig/20220413-094341-ladsgroup.json
  • 09:43 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 09:24 jnuche@deploy1002: Finished deploy [restbase/deploy@627f7d7] (dev-cluster): (no justification provided) (duration: 02m 51s)
  • 09:21 jnuche@deploy1002: Started deploy [restbase/deploy@627f7d7] (dev-cluster): (no justification provided)
  • 09:14 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 09:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24582 and previous config saved to /var/cache/conftool/dbconfig/20220413-084749-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 08:46 jayme@deploy1002: Finished deploy [restbase/deploy@627f7d7] (dev-cluster): (no justification provided) (duration: 02m 41s)
  • 08:44 jayme@deploy1002: Started deploy [restbase/deploy@627f7d7] (dev-cluster): (no justification provided)
  • 08:41 jayme: imported scap 4.6.1 to stretch-/buster-/bullseye-wikimedia - T305949
  • 08:41 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:41 ayounsi@cumin2002: START - Cookbook sre.network.cf
  • 08:41 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:41 ayounsi@cumin2002: START - Cookbook sre.network.cf
  • 08:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24581 and previous config saved to /var/cache/conftool/dbconfig/20220413-080040-ladsgroup.json
  • 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24580 and previous config saved to /var/cache/conftool/dbconfig/20220413-074534-ladsgroup.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2072 db2085:3311', diff saved to https://phabricator.wikimedia.org/P24579 and previous config saved to /var/cache/conftool/dbconfig/20220413-073119-root.json
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24578 and previous config saved to /var/cache/conftool/dbconfig/20220413-073029-ladsgroup.json
  • 07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24577 and previous config saved to /var/cache/conftool/dbconfig/20220413-071524-ladsgroup.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P24576 and previous config saved to /var/cache/conftool/dbconfig/20220413-071506-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24575 and previous config saved to /var/cache/conftool/dbconfig/20220413-071445-root.json
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add SectionTranslation entry points as campaigns (T298029) (duration: 01m 03s)
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P24574 and previous config saved to /var/cache/conftool/dbconfig/20220413-070002-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24573 and previous config saved to /var/cache/conftool/dbconfig/20220413-065941-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P24572 and previous config saved to /var/cache/conftool/dbconfig/20220413-064459-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24571 and previous config saved to /var/cache/conftool/dbconfig/20220413-064437-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P24570 and previous config saved to /var/cache/conftool/dbconfig/20220413-062955-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24569 and previous config saved to /var/cache/conftool/dbconfig/20220413-062933-root.json
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24568 and previous config saved to /var/cache/conftool/dbconfig/20220413-061815-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24567 and previous config saved to /var/cache/conftool/dbconfig/20220413-061803-ladsgroup.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P24566 and previous config saved to /var/cache/conftool/dbconfig/20220413-061451-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24565 and previous config saved to /var/cache/conftool/dbconfig/20220413-061429-root.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24564 and previous config saved to /var/cache/conftool/dbconfig/20220413-060258-ladsgroup.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P24563 and previous config saved to /var/cache/conftool/dbconfig/20220413-055947-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 5%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24562 and previous config saved to /var/cache/conftool/dbconfig/20220413-055925-root.json
  • 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24561 and previous config saved to /var/cache/conftool/dbconfig/20220413-054753-ladsgroup.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2130 db2088:3311', diff saved to https://phabricator.wikimedia.org/P24560 and previous config saved to /var/cache/conftool/dbconfig/20220413-054739-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P24559 and previous config saved to /var/cache/conftool/dbconfig/20220413-054443-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 1%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24558 and previous config saved to /var/cache/conftool/dbconfig/20220413-054422-root.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181 for reboot T306001', diff saved to https://phabricator.wikimedia.org/P24557 and previous config saved to /var/cache/conftool/dbconfig/20220413-053526-root.json
  • 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24556 and previous config saved to /var/cache/conftool/dbconfig/20220413-053248-ladsgroup.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 1%: After schema changes', diff saved to https://phabricator.wikimedia.org/P24554 and previous config saved to /var/cache/conftool/dbconfig/20220413-051238-root.json
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3311', diff saved to https://phabricator.wikimedia.org/P24553 and previous config saved to /var/cache/conftool/dbconfig/20220413-045646-root.json
  • 04:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24552 and previous config saved to /var/cache/conftool/dbconfig/20220413-042723-ladsgroup.json
  • 04:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 04:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 03:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 03:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 03:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 03:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24551 and previous config saved to /var/cache/conftool/dbconfig/20220413-033906-ladsgroup.json
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24550 and previous config saved to /var/cache/conftool/dbconfig/20220413-032400-ladsgroup.json
  • 03:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24549 and previous config saved to /var/cache/conftool/dbconfig/20220413-030855-ladsgroup.json
  • 02:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24548 and previous config saved to /var/cache/conftool/dbconfig/20220413-025350-ladsgroup.json
  • 01:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24547 and previous config saved to /var/cache/conftool/dbconfig/20220413-015727-ladsgroup.json
  • 01:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 01:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 01:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24546 and previous config saved to /var/cache/conftool/dbconfig/20220413-015719-ladsgroup.json
  • 01:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24545 and previous config saved to /var/cache/conftool/dbconfig/20220413-014214-ladsgroup.json
  • 01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24544 and previous config saved to /var/cache/conftool/dbconfig/20220413-012709-ladsgroup.json
  • 01:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2033.codfw.wmnet with OS stretch
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24542 and previous config saved to /var/cache/conftool/dbconfig/20220413-011204-ladsgroup.json
  • 01:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
  • 00:59 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2033.codfw.wmnet with reason: host reimage
  • 00:44 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2033.codfw.wmnet with OS stretch
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24541 and previous config saved to /var/cache/conftool/dbconfig/20220413-001811-ladsgroup.json
  • 00:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24540 and previous config saved to /var/cache/conftool/dbconfig/20220413-001803-ladsgroup.json
  • 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24539 and previous config saved to /var/cache/conftool/dbconfig/20220413-000258-ladsgroup.json

2022-04-12

  • 23:48 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: Upgrading Elasticsearch to 6.8 in CODFW - bking@cumin1001 - T301958
  • 23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24538 and previous config saved to /var/cache/conftool/dbconfig/20220412-234753-ladsgroup.json
  • 23:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24537 and previous config saved to /var/cache/conftool/dbconfig/20220412-233248-ladsgroup.json
  • 23:23 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1014.eqiad.wmnet with OS bullseye
  • 23:03 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1014.eqiad.wmnet with reason: host reimage
  • 22:59 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1014.eqiad.wmnet with reason: host reimage
  • 22:48 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1014.eqiad.wmnet with OS bullseye
  • 22:46 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1014.eqiad.wmnet with reason: Upgrade to bullseye
  • 22:46 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1014.eqiad.wmnet with reason: Upgrade to bullseye
  • 22:39 ryankemper: T305646 Re-enabling puppet on `elastic2033`; still need to unban from elasticsearch cluster tomorrow
  • 22:34 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: Upgrading Elasticsearch to 6.8 in CODFW - bking@cumin1001 - T301958
  • 22:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24536 and previous config saved to /var/cache/conftool/dbconfig/20220412-223206-ladsgroup.json
  • 22:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 22:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24535 and previous config saved to /var/cache/conftool/dbconfig/20220412-223158-ladsgroup.json
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24534 and previous config saved to /var/cache/conftool/dbconfig/20220412-221652-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24533 and previous config saved to /var/cache/conftool/dbconfig/20220412-220147-ladsgroup.json
  • 21:59 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: Upgrading Elasticsearch to 6.8 in CODFW - bking@cumin1001 - T301958
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24531 and previous config saved to /var/cache/conftool/dbconfig/20220412-214642-ladsgroup.json
  • 21:37 milimetric@deploy1002: Finished deploy [analytics/refinery@34be9f3]: Regular analytics weekly train [analytics/refinery@34be9f3] (duration: 21m 24s)
  • 21:16 milimetric@deploy1002: Started deploy [analytics/refinery@34be9f3]: Regular analytics weekly train [analytics/refinery@34be9f3]
  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:13 razzi: razzi@clouddb1013:~$ sudo systemctl reset-failed wmf-pt-kill.service - the wmf-pt-kill@<section>.service units are running fine
  • 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24530 and previous config saved to /var/cache/conftool/dbconfig/20220412-205414-ladsgroup.json
  • 20:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 20:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 20:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24529 and previous config saved to /var/cache/conftool/dbconfig/20220412-205406-ladsgroup.json
  • 20:41 sbassett: re-deploy security patch for T226212 to wmf.6 - part 2
  • 20:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24528 and previous config saved to /var/cache/conftool/dbconfig/20220412-203900-ladsgroup.json
  • 20:39 sbassett: re-deploy security patch for T226212 to wmf.6 - part 1
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 cjming: end of UTC late backport & config window
  • 20:27 cjming@deploy1002: Synchronized wmf-config: Config: Stop setting $wgMultiContentRevisionSchemaMigrationStage (T231674) (duration: 01m 33s)
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24527 and previous config saved to /var/cache/conftool/dbconfig/20220412-202355-ladsgroup.json
  • 20:23 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@a68eaf2]: Fixes date format in path to dumps files (duration: 00m 07s)
  • 20:23 milimetric@deploy1002: Started deploy [airflow-dags/analytics@a68eaf2]: Fixes date format in path to dumps files
  • 20:21 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@a68eaf2]: (no justification provided) (duration: 00m 07s)
  • 20:21 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@a68eaf2]: (no justification provided)
  • 20:20 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [wmf-config] Undeploy safety survey from PT wiki - PRODUCTION (T305855) (duration: 02m 11s)
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24525 and previous config saved to /var/cache/conftool/dbconfig/20220412-200850-ladsgroup.json
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.7 refs T305213
  • 19:56 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.7 refs T305213 (duration: 49m 10s)
  • 19:54 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1013.eqiad.wmnet with OS bullseye
  • 19:39 volans: uploaded spicerack_2.4.1 to apt.wikimedia.org bullseye-wikimedia
  • 19:21 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: Upgrading Elasticsearch to 6.8 in CODFW - bking@cumin1001 - T301958
  • 19:18 ryankemper: T295666 Gearing up for rolling upgrade of codfw cirrus to `6.8.23`. Commencing operation shortly. Will be using a batch size of 3 hosts
  • 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24524 and previous config saved to /var/cache/conftool/dbconfig/20220412-191310-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 19:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24523 and previous config saved to /var/cache/conftool/dbconfig/20220412-191302-ladsgroup.json
  • 19:07 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.7 refs T305213
  • 19:06 dancy@deploy1002: deploy-promote aborted: (duration: 00m 14s)
  • 19:06 dancy@deploy1002: prep aborted: (duration: 00m 11s)
  • 19:06 dancy@deploy1002: deploy-promote aborted: (duration: 01m 09s)
  • 19:05 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.7 refs T305213
  • 19:00 dancy@deploy1002: scap failed: TypeError cannot unpack non-iterable NoneType object (duration: 01m 34s)
  • 18:59 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.7 refs T305213
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24522 and previous config saved to /var/cache/conftool/dbconfig/20220412-185757-ladsgroup.json
  • 18:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24521 and previous config saved to /var/cache/conftool/dbconfig/20220412-184252-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24520 and previous config saved to /var/cache/conftool/dbconfig/20220412-182747-ladsgroup.json
  • 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24517 and previous config saved to /var/cache/conftool/dbconfig/20220412-173430-ladsgroup.json
  • 17:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 17:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24516 and previous config saved to /var/cache/conftool/dbconfig/20220412-173422-ladsgroup.json
  • 17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24515 and previous config saved to /var/cache/conftool/dbconfig/20220412-171917-ladsgroup.json
  • 17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24514 and previous config saved to /var/cache/conftool/dbconfig/20220412-170412-ladsgroup.json
  • 17:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2027.codfw.wmnet with OS buster
  • 16:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1013.eqiad.wmnet with reason: host reimage
  • 16:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2027.codfw.wmnet with reason: host reimage
  • 16:53 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1013.eqiad.wmnet with reason: host reimage
  • 16:49 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2027.codfw.wmnet with reason: host reimage
  • 16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24513 and previous config saved to /var/cache/conftool/dbconfig/20220412-164907-ladsgroup.json
  • 16:42 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1013.eqiad.wmnet with OS bullseye
  • 16:33 mutante: gitlab: pausing runner-1013, then will remove it and create new bullseye runner to replace it
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS buster
  • 16:08 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1308.eqiad.wmnet
  • 16:01 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24512 and previous config saved to /var/cache/conftool/dbconfig/20220412-155143-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 15:49 arturo: aborrero@apt1001:~ $ sudo -i reprepro -C component/prometheus-openstack-exporter includedeb bullseye-wikimedia ${PWD}/prometheus-openstack-exporter_1.5.0-1_amd64.deb (T302178)
  • 15:46 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudstore1010.wikimedia.org with OS bullseye
  • 15:45 arturo: removed a bunch of old src & binary packages for prometheus-openstack-exporter (T302178)
  • 15:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2026.codfw.wmnet
  • 15:36 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1013.eqiad.wmnet with OS bullseye
  • 15:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2020.codfw.wmnet
  • 15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2017.codfw.wmnet
  • 15:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2019.codfw.wmnet
  • 15:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudstore1011.wikimedia.org with OS bullseye
  • 15:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudstore1010.wikimedia.org with OS bullseye
  • 15:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudstore1010.wikimedia.org with OS bullseye
  • 15:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudstore1010.wikimedia.org with OS bullseye
  • 15:10 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb1013.eqiad.wmnet with OS bullseye
  • 15:09 vgutierrez@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp5002.eqsin.wmnet
  • 15:08 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1013.eqiad.wmnet with reason: Upgrade clouddb1013 to bullseye
  • 15:08 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1013.eqiad.wmnet with reason: Upgrade clouddb1013 to bullseye
  • 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:06 dancy@deploy1002: Synchronized php-1.39.0-wmf.6/includes/EditPage.php: Backport: Temporarily undeprecate EditPage::$textbox2 (T305028) (duration: 00m 52s)
  • 15:05 hnowlan@deploy1002: Finished deploy [restbase/deploy@627f7d7]: add guw.wikipedia.org (duration: 15m 56s)
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:49 hnowlan@deploy1002: Started deploy [restbase/deploy@627f7d7]: add guw.wikipedia.org
  • 14:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudstore1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:47 hnowlan@deploy1002: Finished deploy [restbase/deploy@31675fb]: add guw.wikipedia.org (duration: 00m 22s)
  • 14:46 hnowlan@deploy1002: Started deploy [restbase/deploy@31675fb]: add guw.wikipedia.org
  • 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudstore1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudstore1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudstore1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:38 topranks: Adding loopback4 filter to lo0.0 interface ingress lsw1-e1-eqiad T304553
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wcqs2001.codfw.wmnet
  • 13:28 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wcqs2001.codfw.wmnet
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:26 Lucas_WMDE: UTC afternoon backport window done
  • 13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24510 and previous config saved to /var/cache/conftool/dbconfig/20220412-132400-ladsgroup.json
  • 13:23 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.6/extensions/Wikibase/client/includes/Store/Sql/UnexpectedUnconnectedPagePrimer.php: Backport: Don’t use session-consistent connections in UnexpectedUnconnectedPagePrimer (duration: 00m 57s)
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 urbanecm@deploy1002: Synchronized multiversion/buildConfigCache.php: 8b74b08: Migrate $wmfConfigDir to $configDir (T45956) (duration: 00m 51s)
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24509 and previous config saved to /var/cache/conftool/dbconfig/20220412-130855-ladsgroup.json
  • 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 572e621: Remove unused wgKartographerDfltStyle after tegola roll out (T298249) (duration: 00m 52s)
  • 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24508 and previous config saved to /var/cache/conftool/dbconfig/20220412-125350-ladsgroup.json
  • 12:51 topranks: modify loopback filter on cr3-ulsfo to add terms needed in evpn context T304553
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24507 and previous config saved to /var/cache/conftool/dbconfig/20220412-123845-ladsgroup.json
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311', diff saved to https://phabricator.wikimedia.org/P24506 and previous config saved to /var/cache/conftool/dbconfig/20220412-121744-root.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2071', diff saved to https://phabricator.wikimedia.org/P24505 and previous config saved to /var/cache/conftool/dbconfig/20220412-121254-root.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24504 and previous config saved to /var/cache/conftool/dbconfig/20220412-114152-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 11:36 marostegui: dbmaint s4@eqiad T300775
  • 11:11 marostegui: dbmaint s4@eqiad T298554
  • 10:58 marostegui: dbmaint s4@eqiad T300992
  • 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:41 klausman: restarting pybal on lvs2009 for change 779449
  • 10:35 klausman: restarting pybal on lvs2010 for change 779449
  • 10:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24502 and previous config saved to /var/cache/conftool/dbconfig/20220412-100719-ladsgroup.json
  • 10:02 jayme: running logrotate /etc/logrotate.d/rsyslog --force on ml-staging-ctrl2001 (no space left on device)
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130', diff saved to https://phabricator.wikimedia.org/P24501 and previous config saved to /var/cache/conftool/dbconfig/20220412-100147-root.json
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24500 and previous config saved to /var/cache/conftool/dbconfig/20220412-095214-ladsgroup.json
  • 09:44 jayme: running logrotate /etc/logrotate.d/rsyslog --force on ml-staging-ctrl2002 (no space left on device)
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24499 and previous config saved to /var/cache/conftool/dbconfig/20220412-093709-ladsgroup.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool all codfw hosts that went down for on-site maintenance', diff saved to https://phabricator.wikimedia.org/P24498 and previous config saved to /var/cache/conftool/dbconfig/20220412-092846-root.json
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2071 db2072', diff saved to https://phabricator.wikimedia.org/P24497 and previous config saved to /var/cache/conftool/dbconfig/20220412-092730-root.json
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24495 and previous config saved to /var/cache/conftool/dbconfig/20220412-092204-ladsgroup.json
  • 09:11 marostegui: dbmaint s4@eqiad T298557
  • 09:01 marostegui: dbmaint s7@eqiad T297189
  • 08:51 marostegui: dbmaint s4@eqiad T298294
  • 08:51 marostegui: dbmaint s4@eqiad T298556
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24494 and previous config saved to /var/cache/conftool/dbconfig/20220412-083000-ladsgroup.json
  • 08:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24493 and previous config saved to /var/cache/conftool/dbconfig/20220412-082948-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24492 and previous config saved to /var/cache/conftool/dbconfig/20220412-081443-ladsgroup.json
  • 08:03 gmodena@deploy1002: Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 05m 20s)
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24491 and previous config saved to /var/cache/conftool/dbconfig/20220412-075938-ladsgroup.json
  • 07:57 gmodena@deploy1002: Started deploy [airflow-dags/research@b029f10]: (no justification provided)
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24490 and previous config saved to /var/cache/conftool/dbconfig/20220412-074433-ladsgroup.json
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1138.eqiad.wmnet with OS bullseye
  • 07:04 marostegui: dbmaint s4@eqiad T300381
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1138.eqiad.wmnet with reason: host reimage
  • 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1138.eqiad.wmnet with reason: host reimage
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24489 and previous config saved to /var/cache/conftool/dbconfig/20220412-065102-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1138.eqiad.wmnet with OS bullseye
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 T304933', diff saved to https://phabricator.wikimedia.org/P24487 and previous config saved to /var/cache/conftool/dbconfig/20220412-060628-root.json
  • 06:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1160 to s4 primary and set section read-write T304933', diff saved to https://phabricator.wikimedia.org/P24486 and previous config saved to /var/cache/conftool/dbconfig/20220412-060125-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T304933', diff saved to https://phabricator.wikimedia.org/P24485 and previous config saved to /var/cache/conftool/dbconfig/20220412-060057-root.json
  • 06:00 marostegui: Starting s4 eqiad failover from db1138 to db1160 - T304933
  • 05:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 05:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24484 and previous config saved to /var/cache/conftool/dbconfig/20220412-051633-ladsgroup.json
  • 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24483 and previous config saved to /var/cache/conftool/dbconfig/20220412-050128-ladsgroup.json
  • 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1160 with weight 0 T304933', diff saved to https://phabricator.wikimedia.org/P24482 and previous config saved to /var/cache/conftool/dbconfig/20220412-045023-root.json
  • 04:50 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 31 hosts with reason: Primary switchover s4 T304933
  • 04:49 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 31 hosts with reason: Primary switchover s4 T304933
  • 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24481 and previous config saved to /var/cache/conftool/dbconfig/20220412-044623-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24480 and previous config saved to /var/cache/conftool/dbconfig/20220412-043118-ladsgroup.json
  • 03:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24479 and previous config saved to /var/cache/conftool/dbconfig/20220412-033633-ladsgroup.json
  • 03:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 03:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 02:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:03 eileen: config revision changed from c4cab5b1 to c8fee00c
  • 02:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:02 eileen: civicrm revision changed from a90a6709 to 7de7ddd4
  • 01:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 01:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 01:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 01:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24478 and previous config saved to /var/cache/conftool/dbconfig/20220412-010438-ladsgroup.json
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24477 and previous config saved to /var/cache/conftool/dbconfig/20220412-004933-ladsgroup.json
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24476 and previous config saved to /var/cache/conftool/dbconfig/20220412-003428-ladsgroup.json
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24475 and previous config saved to /var/cache/conftool/dbconfig/20220412-001923-ladsgroup.json

2022-04-11

  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24474 and previous config saved to /var/cache/conftool/dbconfig/20220411-232102-ladsgroup.json
  • 23:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 23:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 23:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24473 and previous config saved to /var/cache/conftool/dbconfig/20220411-232045-ladsgroup.json
  • 23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24472 and previous config saved to /var/cache/conftool/dbconfig/20220411-230540-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24471 and previous config saved to /var/cache/conftool/dbconfig/20220411-225035-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24470 and previous config saved to /var/cache/conftool/dbconfig/20220411-223530-ladsgroup.json
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24469 and previous config saved to /var/cache/conftool/dbconfig/20220411-214408-ladsgroup.json
  • 21:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24468 and previous config saved to /var/cache/conftool/dbconfig/20220411-214400-ladsgroup.json
  • 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24467 and previous config saved to /var/cache/conftool/dbconfig/20220411-212855-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24466 and previous config saved to /var/cache/conftool/dbconfig/20220411-211350-ladsgroup.json
  • 21:02 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GlobalBlocking/maintenance/PopulateCentralId.php --wiki=metawiki # END, T305014
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24463 and previous config saved to /var/cache/conftool/dbconfig/20220411-205844-ladsgroup.json
  • 20:45 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GlobalBlocking/maintenance/PopulateCentralId.php --wiki=metawiki # START, T305014, running in a tmux under my account at mwmaint1002
  • 20:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 17c8c17: Start writing to cuc_actor in guwwiki and shnwikivoyage (T233004) (duration: 00m 51s)
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:15 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 8455fa0: Stop writing to $wmfUsingKubernetes (T45956) (duration: 00m 51s)
  • 20:11 urbanecm@deploy1002: Synchronized wmf-config/: d4ff32f: Migrate $wmfUsingKubernetes to $wmgUsingKubernetes (T45956) (duration: 00m 53s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24461 and previous config saved to /var/cache/conftool/dbconfig/20220411-200301-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24460 and previous config saved to /var/cache/conftool/dbconfig/20220411-200253-ladsgroup.json
  • 19:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T297189)', diff saved to https://phabricator.wikimedia.org/P24459 and previous config saved to /var/cache/conftool/dbconfig/20220411-194812-marostegui.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24458 and previous config saved to /var/cache/conftool/dbconfig/20220411-194748-ladsgroup.json
  • 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P24457 and previous config saved to /var/cache/conftool/dbconfig/20220411-193307-marostegui.json
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24456 and previous config saved to /var/cache/conftool/dbconfig/20220411-193243-ladsgroup.json
  • 19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P24454 and previous config saved to /var/cache/conftool/dbconfig/20220411-191802-marostegui.json
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24453 and previous config saved to /var/cache/conftool/dbconfig/20220411-191738-ladsgroup.json
  • 19:17 mutante: runner-1022.gitlab-runners - rm -rf /var/lib/puppet/ssl ; run puppet; sign new request on gitlab-runners-puppetmaster-01.gitlab-runners (normal procedure needed when creating fresh instance in project with local puppetmaster) T297659
  • 19:09 mutante: gitlab - deleting runner-1011, creating new runner runner-1022 using bullseye
  • 19:08 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: testing spicerack
  • 19:08 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: testing spicerack
  • 19:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T297189)', diff saved to https://phabricator.wikimedia.org/P24452 and previous config saved to /var/cache/conftool/dbconfig/20220411-190257-marostegui.json
  • 19:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: testing spicerack
  • 19:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: testing spicerack
  • 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:26 mutante: gitlab-runners: pausing runner-1011 in gitlab UI from accepting new jobs, then deleting instance in Horizon UI to replace it with another bullseye instance T297659
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24451 and previous config saved to /var/cache/conftool/dbconfig/20220411-182258-ladsgroup.json
  • 18:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24450 and previous config saved to /var/cache/conftool/dbconfig/20220411-182250-ladsgroup.json
  • 18:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T297189)', diff saved to https://phabricator.wikimedia.org/P24449 and previous config saved to /var/cache/conftool/dbconfig/20220411-181947-marostegui.json
  • 18:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24448 and previous config saved to /var/cache/conftool/dbconfig/20220411-181939-marostegui.json
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24447 and previous config saved to /var/cache/conftool/dbconfig/20220411-180745-ladsgroup.json
  • 18:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P24446 and previous config saved to /var/cache/conftool/dbconfig/20220411-180433-marostegui.json
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24445 and previous config saved to /var/cache/conftool/dbconfig/20220411-175240-ladsgroup.json
  • 17:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P24444 and previous config saved to /var/cache/conftool/dbconfig/20220411-174928-marostegui.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24443 and previous config saved to /var/cache/conftool/dbconfig/20220411-173735-ladsgroup.json
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24442 and previous config saved to /var/cache/conftool/dbconfig/20220411-173423-marostegui.json
  • 17:31 papaul: powerdown rdb2008 for relocation
  • 17:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:24 papaul: powerdown kubestage2002 for relocation
  • 17:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: moving to a different rack
  • 17:11 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: moving to a different rack
  • 17:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: moving to a different rack
  • 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: moving to a different rack
  • 17:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2008.codfw.wmnet with reason: moving to a different rack
  • 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2008.codfw.wmnet with reason: moving to a different rack
  • 16:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: physically moving host
  • 16:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: physically moving host
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24440 and previous config saved to /var/cache/conftool/dbconfig/20220411-164144-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24439 and previous config saved to /var/cache/conftool/dbconfig/20220411-164136-ladsgroup.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24438 and previous config saved to /var/cache/conftool/dbconfig/20220411-163248-marostegui.json
  • 16:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24437 and previous config saved to /var/cache/conftool/dbconfig/20220411-163240-marostegui.json
  • 16:30 aqu@deploy1002: Finished deploy [airflow-dags/analytics@cae0024]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@cae0024] (duration: 00m 08s)
  • 16:30 aqu@deploy1002: Started deploy [airflow-dags/analytics@cae0024]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@cae0024]
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24436 and previous config saved to /var/cache/conftool/dbconfig/20220411-162630-ladsgroup.json
  • 16:20 papaul: powerdown maps2006 for relocation
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P24435 and previous config saved to /var/cache/conftool/dbconfig/20220411-161735-marostegui.json
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24434 and previous config saved to /var/cache/conftool/dbconfig/20220411-161125-ladsgroup.json
  • 16:06 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Use wgRestAPIAdditionalRouteFiles for WB REST API (duration: 00m 51s)
  • 16:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P24432 and previous config saved to /var/cache/conftool/dbconfig/20220411-160230-marostegui.json
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24431 and previous config saved to /var/cache/conftool/dbconfig/20220411-155620-ladsgroup.json
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24430 and previous config saved to /var/cache/conftool/dbconfig/20220411-154725-marostegui.json
  • 15:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:34 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 15:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24429 and previous config saved to /var/cache/conftool/dbconfig/20220411-150117-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:52 mvernon@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=swift,service=swift-fe
  • 14:52 mvernon@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=swift,service=nginx
  • 14:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1012.eqiad.wmnet with OS bullseye
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24428 and previous config saved to /var/cache/conftool/dbconfig/20220411-143411-marostegui.json
  • 14:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24427 and previous config saved to /var/cache/conftool/dbconfig/20220411-143403-marostegui.json
  • 14:22 Guest9647: powerdown ganeti2019 for relocation
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P24426 and previous config saved to /var/cache/conftool/dbconfig/20220411-141858-marostegui.json
  • 14:18 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:17 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P24425 and previous config saved to /var/cache/conftool/dbconfig/20220411-141428-root.json
  • 14:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 14:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 14:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 14:07 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P24424 and previous config saved to /var/cache/conftool/dbconfig/20220411-140415-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P24423 and previous config saved to /var/cache/conftool/dbconfig/20220411-140353-marostegui.json
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P24422 and previous config saved to /var/cache/conftool/dbconfig/20220411-135343-root.json
  • 13:53 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1012.eqiad.wmnet with OS bullseye
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24421 and previous config saved to /var/cache/conftool/dbconfig/20220411-134848-marostegui.json
  • 13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24420 and previous config saved to /var/cache/conftool/dbconfig/20220411-132422-ladsgroup.json
  • 13:11 aqu@deploy1002: Finished deploy [analytics/refinery@f0a1656] (hadoop-test): Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656] (duration: 07m 00s)
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24419 and previous config saved to /var/cache/conftool/dbconfig/20220411-130916-ladsgroup.json
  • 13:04 aqu@deploy1002: Started deploy [analytics/refinery@f0a1656] (hadoop-test): Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656]
  • 13:03 aqu@deploy1002: Finished deploy [analytics/refinery@f0a1656] (thin): Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656] (duration: 00m 07s)
  • 13:03 aqu@deploy1002: Started deploy [analytics/refinery@f0a1656] (thin): Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656]
  • 12:58 aqu@deploy1002: Finished deploy [analytics/refinery@f0a1656]: Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656] (duration: 20m 23s)
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24418 and previous config saved to /var/cache/conftool/dbconfig/20220411-125411-ladsgroup.json
  • 12:48 aqu@deploy1002: Finished deploy [airflow-dags/analytics@cae0024]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@cae0024] (duration: 00m 32s)
  • 12:47 aqu@deploy1002: Started deploy [airflow-dags/analytics@cae0024]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@cae0024]
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24417 and previous config saved to /var/cache/conftool/dbconfig/20220411-123906-ladsgroup.json
  • 12:37 aqu@deploy1002: Started deploy [analytics/refinery@f0a1656]: Migrate mediarequest hourly from Oozie to Airflow [analytics/refinery@f0a1656]
  • 12:36 aqu: About to deploy analytics/refinery "Migrate mediarequest hourly from Oozie to Airflow"
  • 12:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1151.eqiad.wmnet with reason: Rebooting for T303174
  • 12:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1151.eqiad.wmnet with reason: Rebooting for T303174
  • 12:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting for T303174
  • 12:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting for T303174
  • 12:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Rebooting x2 codfw primary T303174
  • 12:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Rebooting x2 codfw primary T303174
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T297189)', diff saved to https://phabricator.wikimedia.org/P24416 and previous config saved to /var/cache/conftool/dbconfig/20220411-122220-marostegui.json
  • 12:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T297189)', diff saved to https://phabricator.wikimedia.org/P24415 and previous config saved to /var/cache/conftool/dbconfig/20220411-122212-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P24414 and previous config saved to /var/cache/conftool/dbconfig/20220411-120707-marostegui.json
  • 12:02 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:56 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P24413 and previous config saved to /var/cache/conftool/dbconfig/20220411-115202-marostegui.json
  • 11:46 topranks: Adjust loopback filter on asw1-b12-drmrs to align with CR router config. T304553.
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24412 and previous config saved to /var/cache/conftool/dbconfig/20220411-114053-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 11:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24411 and previous config saved to /var/cache/conftool/dbconfig/20220411-114041-ladsgroup.json
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T297189)', diff saved to https://phabricator.wikimedia.org/P24410 and previous config saved to /var/cache/conftool/dbconfig/20220411-113657-marostegui.json
  • 11:34 topranks: Adjust loopback filter on cr3-ulsfo to align with L3 switch config. T304553.
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P24409 and previous config saved to /var/cache/conftool/dbconfig/20220411-112825-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1106', diff saved to https://phabricator.wikimedia.org/P24408 and previous config saved to /var/cache/conftool/dbconfig/20220411-112741-root.json
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24407 and previous config saved to /var/cache/conftool/dbconfig/20220411-112536-ladsgroup.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P24406 and previous config saved to /var/cache/conftool/dbconfig/20220411-112452-root.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P24405 and previous config saved to /var/cache/conftool/dbconfig/20220411-112229-root.json
  • 11:18 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24404 and previous config saved to /var/cache/conftool/dbconfig/20220411-111030-ladsgroup.json
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24403 and previous config saved to /var/cache/conftool/dbconfig/20220411-105525-ladsgroup.json
  • 10:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2121.codfw.wmnet with reason: Rebooting for T303174
  • 10:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2121.codfw.wmnet with reason: Rebooting for T303174
  • 10:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2121.codfw.wmnet with reason: Rebooting for T303174
  • 10:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2121.codfw.wmnet with reason: Rebooting for T303174
  • 10:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Rebooting primary T303174
  • 10:37 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Rebooting primary T303174
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T297189)', diff saved to https://phabricator.wikimedia.org/P24402 and previous config saved to /var/cache/conftool/dbconfig/20220411-103336-marostegui.json
  • 10:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:10 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:01 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:59 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.6/extensions/TimedMediaHandler/resources/ext.tmh.player.element.js: Backport: Older browser do not return a promise from .play() (T304705) (duration: 00m 52s)
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24401 and previous config saved to /var/cache/conftool/dbconfig/20220411-095826-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 09:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable videojs on wiktionary wikis (T248418) (duration: 00m 52s)
  • 09:39 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2007.codfw.wmnet with OS bullseye
  • 09:28 kart_: Updated cxserver to 2022-04-11-085026-production (T305125)
  • 09:26 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 09:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 09:25 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2007.codfw.wmnet with reason: host reimage
  • 09:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 9 hosts with reason: Maintenance
  • 09:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 9 hosts with reason: Maintenance
  • 09:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 09:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 09:19 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2007.codfw.wmnet with reason: host reimage
  • 09:17 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:17 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P24400 and previous config saved to /var/cache/conftool/dbconfig/20220411-091512-root.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1135', diff saved to https://phabricator.wikimedia.org/P24399 and previous config saved to /var/cache/conftool/dbconfig/20220411-091455-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1135', diff saved to https://phabricator.wikimedia.org/P24398 and previous config saved to /var/cache/conftool/dbconfig/20220411-091319-root.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1135', diff saved to https://phabricator.wikimedia.org/P24397 and previous config saved to /var/cache/conftool/dbconfig/20220411-091103-root.json
  • 09:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24396 and previous config saved to /var/cache/conftool/dbconfig/20220411-091011-ladsgroup.json
  • 08:59 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2007.codfw.wmnet with OS bullseye
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24395 and previous config saved to /var/cache/conftool/dbconfig/20220411-085506-ladsgroup.json
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24394 and previous config saved to /var/cache/conftool/dbconfig/20220411-084001-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24393 and previous config saved to /var/cache/conftool/dbconfig/20220411-082456-ladsgroup.json
  • 08:23 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a337e34]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@a337e34] (duration: 00m 07s)
  • 08:23 aqu@deploy1002: Started deploy [airflow-dags/analytics@a337e34]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@a337e34]
  • 08:23 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a337e34]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics_test@a337e34] (duration: 00m 08s)
  • 08:22 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a337e34]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics_test@a337e34]
  • 08:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T297189)', diff saved to https://phabricator.wikimedia.org/P24392 and previous config saved to /var/cache/conftool/dbconfig/20220411-082130-marostegui.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24391 and previous config saved to /var/cache/conftool/dbconfig/20220411-080625-marostegui.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135', diff saved to https://phabricator.wikimedia.org/P24390 and previous config saved to /var/cache/conftool/dbconfig/20220411-080402-root.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1163', diff saved to https://phabricator.wikimedia.org/P24389 and previous config saved to /var/cache/conftool/dbconfig/20220411-080344-root.json
  • 08:02 aqu@deploy1002: Finished deploy [airflow-dags/analytics@63cbb55]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@63cbb55] (duration: 04m 21s)
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163', diff saved to https://phabricator.wikimedia.org/P24388 and previous config saved to /var/cache/conftool/dbconfig/20220411-080047-root.json
  • 07:57 aqu@deploy1002: Started deploy [airflow-dags/analytics@63cbb55]: T302876_migrate_mediarequest_to_airflow [airflow-dags/analytics@63cbb55]
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163', diff saved to https://phabricator.wikimedia.org/P24387 and previous config saved to /var/cache/conftool/dbconfig/20220411-075214-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24386 and previous config saved to /var/cache/conftool/dbconfig/20220411-075120-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T297189)', diff saved to https://phabricator.wikimedia.org/P24385 and previous config saved to /var/cache/conftool/dbconfig/20220411-073615-marostegui.json
  • 07:35 dcausse: restarting blazegraph on wdqs1004 (BlazegraphFreeAllocatorsDecreasingRapidly fired over the week-end)
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24384 and previous config saved to /var/cache/conftool/dbconfig/20220411-072556-ladsgroup.json
  • 07:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 07:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24383 and previous config saved to /var/cache/conftool/dbconfig/20220411-072548-ladsgroup.json
  • 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24382 and previous config saved to /var/cache/conftool/dbconfig/20220411-071043-ladsgroup.json
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24381 and previous config saved to /var/cache/conftool/dbconfig/20220411-065538-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24380 and previous config saved to /var/cache/conftool/dbconfig/20220411-064033-ladsgroup.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T297189)', diff saved to https://phabricator.wikimedia.org/P24379 and previous config saved to /var/cache/conftool/dbconfig/20220411-063601-marostegui.json
  • 06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T297189)', diff saved to https://phabricator.wikimedia.org/P24378 and previous config saved to /var/cache/conftool/dbconfig/20220411-063552-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24377 and previous config saved to /var/cache/conftool/dbconfig/20220411-062047-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24376 and previous config saved to /var/cache/conftool/dbconfig/20220411-060542-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T297189)', diff saved to https://phabricator.wikimedia.org/P24375 and previous config saved to /var/cache/conftool/dbconfig/20220411-055037-marostegui.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1164', diff saved to https://phabricator.wikimedia.org/P24374 and previous config saved to /var/cache/conftool/dbconfig/20220411-054902-root.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24373 and previous config saved to /var/cache/conftool/dbconfig/20220411-054618-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24372 and previous config saved to /var/cache/conftool/dbconfig/20220411-054610-ladsgroup.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1164', diff saved to https://phabricator.wikimedia.org/P24371 and previous config saved to /var/cache/conftool/dbconfig/20220411-054508-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1164', diff saved to https://phabricator.wikimedia.org/P24370 and previous config saved to /var/cache/conftool/dbconfig/20220411-054306-root.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24369 and previous config saved to /var/cache/conftool/dbconfig/20220411-053105-ladsgroup.json
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24368 and previous config saved to /var/cache/conftool/dbconfig/20220411-051600-ladsgroup.json
  • 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24367 and previous config saved to /var/cache/conftool/dbconfig/20220411-050055-ladsgroup.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T297189)', diff saved to https://phabricator.wikimedia.org/P24366 and previous config saved to /var/cache/conftool/dbconfig/20220411-044302-marostegui.json
  • 04:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 04:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 04:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1164', diff saved to https://phabricator.wikimedia.org/P24365 and previous config saved to /var/cache/conftool/dbconfig/20220411-044058-root.json
  • 04:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24364 and previous config saved to /var/cache/conftool/dbconfig/20220411-040656-ladsgroup.json
  • 04:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 04:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 04:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24363 and previous config saved to /var/cache/conftool/dbconfig/20220411-040648-ladsgroup.json
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24362 and previous config saved to /var/cache/conftool/dbconfig/20220411-035143-ladsgroup.json
  • 03:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24361 and previous config saved to /var/cache/conftool/dbconfig/20220411-033638-ladsgroup.json
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24360 and previous config saved to /var/cache/conftool/dbconfig/20220411-032132-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24359 and previous config saved to /var/cache/conftool/dbconfig/20220411-022840-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24358 and previous config saved to /var/cache/conftool/dbconfig/20220411-022832-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24357 and previous config saved to /var/cache/conftool/dbconfig/20220411-021327-ladsgroup.json
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24356 and previous config saved to /var/cache/conftool/dbconfig/20220411-015822-ladsgroup.json
  • 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24355 and previous config saved to /var/cache/conftool/dbconfig/20220411-014316-ladsgroup.json
  • 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24354 and previous config saved to /var/cache/conftool/dbconfig/20220411-004826-ladsgroup.json
  • 00:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 00:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24353 and previous config saved to /var/cache/conftool/dbconfig/20220411-004817-ladsgroup.json
  • 00:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24352 and previous config saved to /var/cache/conftool/dbconfig/20220411-003312-ladsgroup.json
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24351 and previous config saved to /var/cache/conftool/dbconfig/20220411-001807-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24350 and previous config saved to /var/cache/conftool/dbconfig/20220411-000302-ladsgroup.json

2022-04-10

  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24349 and previous config saved to /var/cache/conftool/dbconfig/20220410-231112-ladsgroup.json
  • 23:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24348 and previous config saved to /var/cache/conftool/dbconfig/20220410-231104-ladsgroup.json
  • 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24347 and previous config saved to /var/cache/conftool/dbconfig/20220410-225559-ladsgroup.json
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24346 and previous config saved to /var/cache/conftool/dbconfig/20220410-224053-ladsgroup.json
  • 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24345 and previous config saved to /var/cache/conftool/dbconfig/20220410-222548-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24344 and previous config saved to /var/cache/conftool/dbconfig/20220410-212042-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24343 and previous config saved to /var/cache/conftool/dbconfig/20220410-212024-ladsgroup.json
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24342 and previous config saved to /var/cache/conftool/dbconfig/20220410-210519-ladsgroup.json
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24341 and previous config saved to /var/cache/conftool/dbconfig/20220410-205014-ladsgroup.json
  • 20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24340 and previous config saved to /var/cache/conftool/dbconfig/20220410-203508-ladsgroup.json
  • 19:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24339 and previous config saved to /var/cache/conftool/dbconfig/20220410-193900-ladsgroup.json
  • 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance

2022-04-09

  • 12:39 godog: bounce prometheus@ops on prometheus5001
  • 12:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
  • 12:22 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
  • 12:22 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1002.eqiad.wmnet
  • 12:22 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
  • 12:20 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1002.eqiad.wmnet
  • 12:20 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
  • 03:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24337 and previous config saved to /var/cache/conftool/dbconfig/20220409-030854-ladsgroup.json
  • 02:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24336 and previous config saved to /var/cache/conftool/dbconfig/20220409-025349-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24335 and previous config saved to /var/cache/conftool/dbconfig/20220409-023843-ladsgroup.json
  • 02:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24334 and previous config saved to /var/cache/conftool/dbconfig/20220409-022338-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24333 and previous config saved to /var/cache/conftool/dbconfig/20220409-005351-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24332 and previous config saved to /var/cache/conftool/dbconfig/20220409-005338-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24331 and previous config saved to /var/cache/conftool/dbconfig/20220409-003832-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24330 and previous config saved to /var/cache/conftool/dbconfig/20220409-002327-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24329 and previous config saved to /var/cache/conftool/dbconfig/20220409-000822-ladsgroup.json

2022-04-08

  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24328 and previous config saved to /var/cache/conftool/dbconfig/20220408-225350-ladsgroup.json
  • 22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24327 and previous config saved to /var/cache/conftool/dbconfig/20220408-225342-ladsgroup.json
  • 22:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24326 and previous config saved to /var/cache/conftool/dbconfig/20220408-223837-ladsgroup.json
  • 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24325 and previous config saved to /var/cache/conftool/dbconfig/20220408-222332-ladsgroup.json
  • 22:09 mutante: gitlab - deleted runner-1008 (to replace it with a bullseye instance), recreated runner-1020 with same flavor as existing runners T297659
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24324 and previous config saved to /var/cache/conftool/dbconfig/20220408-220827-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24323 and previous config saved to /var/cache/conftool/dbconfig/20220408-204138-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24322 and previous config saved to /var/cache/conftool/dbconfig/20220408-204129-ladsgroup.json
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24321 and previous config saved to /var/cache/conftool/dbconfig/20220408-202624-ladsgroup.json
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24320 and previous config saved to /var/cache/conftool/dbconfig/20220408-201119-ladsgroup.json
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24319 and previous config saved to /var/cache/conftool/dbconfig/20220408-195614-ladsgroup.json
  • 18:38 mutante: gitlab1001 - giving myself gitlab admin rights via rake console, to be able to connect/disconnect runners T297659
  • 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24318 and previous config saved to /var/cache/conftool/dbconfig/20220408-183643-ladsgroup.json
  • 18:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24317 and previous config saved to /var/cache/conftool/dbconfig/20220408-183635-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24316 and previous config saved to /var/cache/conftool/dbconfig/20220408-182130-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24315 and previous config saved to /var/cache/conftool/dbconfig/20220408-180625-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24313 and previous config saved to /var/cache/conftool/dbconfig/20220408-175120-ladsgroup.json
  • 17:35 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:35 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:34 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:34 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24312 and previous config saved to /var/cache/conftool/dbconfig/20220408-162938-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24311 and previous config saved to /var/cache/conftool/dbconfig/20220408-162930-ladsgroup.json
  • 16:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24309 and previous config saved to /var/cache/conftool/dbconfig/20220408-155919-ladsgroup.json
  • 15:53 dancy: dancy@deploy1002: Testing mw container image build
  • 15:52 dancy@deploy1002: Started scap: (no justification provided)
  • 15:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:51 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24308 and previous config saved to /var/cache/conftool/dbconfig/20220408-154414-ladsgroup.json
  • away: re-enabled fundraising scheduled jobs
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24307 and previous config saved to /var/cache/conftool/dbconfig/20220408-143545-root.json
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24306 and previous config saved to /var/cache/conftool/dbconfig/20220408-142239-ladsgroup.json
  • 14:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24305 and previous config saved to /var/cache/conftool/dbconfig/20220408-142230-ladsgroup.json
  • 14:21 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again again again again; keeping queue below the p.age threshold while fr-tech work)
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24304 and previous config saved to /var/cache/conftool/dbconfig/20220408-142041-root.json
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24303 and previous config saved to /var/cache/conftool/dbconfig/20220408-140725-ladsgroup.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24302 and previous config saved to /var/cache/conftool/dbconfig/20220408-140536-root.json
  • 14:02 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1008.eqiad.wmnet with OS bullseye
  • 13:57 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2008.codfw.wmnet with OS bullseye
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24300 and previous config saved to /var/cache/conftool/dbconfig/20220408-135220-ladsgroup.json
  • 13:51 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1008.eqiad.wmnet with reason: host reimage
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P24299 and previous config saved to /var/cache/conftool/dbconfig/20220408-135032-root.json
  • 13:50 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again again again again; keeping queue below the p.age threshold while fr-tech work)
  • 13:47 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1008.eqiad.wmnet with reason: host reimage
  • 13:46 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2008.codfw.wmnet with reason: host reimage
  • 13:43 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2008.codfw.wmnet with reason: host reimage
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24298 and previous config saved to /var/cache/conftool/dbconfig/20220408-133715-ladsgroup.json
  • 13:37 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1008.eqiad.wmnet with OS bullseye
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P24297 and previous config saved to /var/cache/conftool/dbconfig/20220408-133528-root.json
  • 13:30 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2008.codfw.wmnet with OS bullseye
  • 13:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubemaster1001.eqiad.wmnet with reason: reimage
  • 13:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubemaster1001.eqiad.wmnet with reason: reimage
  • 13:20 mmandere: pool cp6001 with HAProxy as TLS termination layer - T290005
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P24296 and previous config saved to /var/cache/conftool/dbconfig/20220408-132024-root.json
  • 13:18 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again again again; keeping queue below the p.age threshold while fr-tech work)
  • 13:16 mmandere: pool cp6009 with HAProxy as TLS termination layer - T290005
  • 13:13 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS buster
  • 13:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS buster
  • 13:00 gmodena@deploy1002: Finished deploy [airflow-dags/research@b029f10]: (no justification provided) (duration: 02m 11s)
  • 12:59 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again again)
  • 12:58 gmodena@deploy1002: Started deploy [airflow-dags/research@b029f10]: (no justification provided)
  • 12:57 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubemaster1002.eqiad.wmnet with reason: reimage
  • 12:57 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubemaster1002.eqiad.wmnet with reason: reimage
  • 12:54 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001 (again)
  • 12:49 ejegg: disabled paypal IPN listener failmail
  • 12:44 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
  • 12:40 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
  • 12:33 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 12:29 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 12:22 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
  • 12:15 mmandere: depool cp6001 for reimage - T290005
  • 12:11 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS buster
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24295 and previous config saved to /var/cache/conftool/dbconfig/20220408-121138-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:45 Emperor: exiqgrep -i -r fr-tech-failmail@wikimedia.org | xargs exim -Mrm on mx1001
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184', diff saved to https://phabricator.wikimedia.org/P24294 and previous config saved to /var/cache/conftool/dbconfig/20220408-113452-root.json
  • 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:11 mmandere: depool cp6009 for reimage - T290005
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:18 mmandere: pool cp6002 with HAProxy as TLS termination layer - T290005
  • 10:15 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1003.eqiad.wmnet with OS bullseye
  • 10:11 mmandere: pool cp6010 with HAProxy as TLS termination layer - T290005
  • 10:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS buster
  • 10:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS buster
  • 10:04 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1003.eqiad.wmnet with reason: host reimage
  • 10:00 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1003.eqiad.wmnet with reason: host reimage
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24293 and previous config saved to /var/cache/conftool/dbconfig/20220408-095458-ladsgroup.json
  • 09:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS buster
  • 09:48 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host dbprov1003.eqiad.wmnet with OS bullseye
  • 09:47 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2003.codfw.wmnet with OS bullseye
  • 09:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24292 and previous config saved to /var/cache/conftool/dbconfig/20220408-094325-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24291 and previous config saved to /var/cache/conftool/dbconfig/20220408-093953-ladsgroup.json
  • 09:35 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1002.eqiad.wmnet with OS bullseye
  • 09:35 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2003.codfw.wmnet with reason: host reimage
  • 09:32 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
  • 09:30 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2003.codfw.wmnet with reason: host reimage
  • 09:29 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24290 and previous config saved to /var/cache/conftool/dbconfig/20220408-092820-ladsgroup.json
  • 09:25 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS buster
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24289 and previous config saved to /var/cache/conftool/dbconfig/20220408-092448-ladsgroup.json
  • 09:24 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: host reimage
  • 09:19 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: host reimage
  • 09:16 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host dbprov2003.codfw.wmnet with OS bullseye
  • 09:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 09:13 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS buster
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24288 and previous config saved to /var/cache/conftool/dbconfig/20220408-091315-ladsgroup.json
  • 09:13 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24287 and previous config saved to /var/cache/conftool/dbconfig/20220408-090943-ladsgroup.json
  • 09:08 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host dbprov1002.eqiad.wmnet with OS bullseye
  • 09:08 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2002.codfw.wmnet with OS bullseye
  • 09:02 mmandere: depool cp6002 for reimage - T290005
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24286 and previous config saved to /var/cache/conftool/dbconfig/20220408-085810-ladsgroup.json
  • 08:57 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2102.codfw.wmnet with OS bullseye
  • 08:57 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS buster
  • 08:56 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2002.codfw.wmnet with reason: host reimage
  • 08:53 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2002.codfw.wmnet with reason: host reimage
  • 08:49 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1001.eqiad.wmnet with OS bullseye
  • 08:48 mmandere: depool cp6010 for reimage - T290005
  • 08:43 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2102.codfw.wmnet with reason: host reimage
  • 08:41 mmandere: pool cp6003 with HAProxy as TLS termination layer - T290005
  • 08:40 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2102.codfw.wmnet with reason: host reimage
  • 08:40 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host dbprov2002.codfw.wmnet with OS bullseye
  • 08:37 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS buster
  • 08:36 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1001.eqiad.wmnet with reason: host reimage
  • 08:33 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1001.eqiad.wmnet with reason: host reimage
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24285 and previous config saved to /var/cache/conftool/dbconfig/20220408-083353-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24284 and previous config saved to /var/cache/conftool/dbconfig/20220408-083345-ladsgroup.json
  • 08:33 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2151.codfw.wmnet with OS bullseye
  • 08:29 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2102.codfw.wmnet with OS bullseye
  • 08:26 mmandere: pool cp6011 with HAProxy as TLS termination layer - T290005
  • 08:24 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS buster
  • 08:21 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host dbprov1001.eqiad.wmnet with OS bullseye
  • 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24283 and previous config saved to /var/cache/conftool/dbconfig/20220408-081840-ladsgroup.json
  • 08:18 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2151.codfw.wmnet with reason: host reimage
  • 08:15 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2151.codfw.wmnet with reason: host reimage
  • 08:10 jynus: restart db1133 T299876
  • 08:06 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2001.codfw.wmnet with OS bullseye
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24282 and previous config saved to /var/cache/conftool/dbconfig/20220408-080335-ladsgroup.json
  • 08:01 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2151.codfw.wmnet with OS bullseye
  • 07:59 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bullseye
  • 07:54 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2001.codfw.wmnet with reason: host reimage
  • 07:50 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2001.codfw.wmnet with reason: host reimage
  • 07:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24281 and previous config saved to /var/cache/conftool/dbconfig/20220408-074829-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24280 and previous config saved to /var/cache/conftool/dbconfig/20220408-074723-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
  • 07:45 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
  • 07:42 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
  • 07:42 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 07:39 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 07:36 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host dbprov2001.codfw.wmnet with OS bullseye
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24279 and previous config saved to /var/cache/conftool/dbconfig/20220408-073442-root.json
  • 07:31 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bullseye
  • 07:28 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS buster
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24278 and previous config saved to /var/cache/conftool/dbconfig/20220408-072615-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 07:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 07:21 mmandere: depool cp6003 for reimage - T290005
  • 07:21 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24277 and previous config saved to /var/cache/conftool/dbconfig/20220408-071938-root.json
  • 07:12 mmandere: depool cp6011 for reimage - T290005
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24276 and previous config saved to /var/cache/conftool/dbconfig/20220408-070434-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P24275 and previous config saved to /var/cache/conftool/dbconfig/20220408-064930-root.json
  • 06:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P24274 and previous config saved to /var/cache/conftool/dbconfig/20220408-063426-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P24273 and previous config saved to /var/cache/conftool/dbconfig/20220408-061922-root.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P24272 and previous config saved to /var/cache/conftool/dbconfig/20220408-051044-root.json
  • 02:30 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: security updates - bking@cumin1001 - T304938

2022-04-07

  • 22:18 ejegg: restarted fundraising scheduled jobs
  • 22:08 ejegg: updated fundraising CiviCRM from 7b7b284d to a90a6709
  • 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 21:46 ejegg: disabled fundraising scheduled jobs for CiviCRM upgrade
  • 21:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1101.eqiad.wmnet with OS bullseye
  • 21:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1100.eqiad.wmnet with OS bullseye
  • 21:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1102.eqiad.wmnet with OS bullseye
  • 21:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1099.eqiad.wmnet with OS bullseye
  • 21:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1101.eqiad.wmnet with reason: host reimage
  • 21:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1100.eqiad.wmnet with reason: host reimage
  • 21:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1098.eqiad.wmnet with OS bullseye
  • 21:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1102.eqiad.wmnet with reason: host reimage
  • 21:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1099.eqiad.wmnet with reason: host reimage
  • 21:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1097.eqiad.wmnet with OS bullseye
  • 21:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1102.eqiad.wmnet with reason: host reimage
  • 21:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1100.eqiad.wmnet with reason: host reimage
  • 21:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1101.eqiad.wmnet with reason: host reimage
  • 21:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1099.eqiad.wmnet with reason: host reimage
  • 21:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1096.eqiad.wmnet with OS bullseye
  • 21:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1098.eqiad.wmnet with reason: host reimage
  • 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1097.eqiad.wmnet with reason: host reimage
  • 20:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1098.eqiad.wmnet with reason: host reimage
  • 20:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1095.eqiad.wmnet with OS bullseye
  • 20:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1097.eqiad.wmnet with reason: host reimage
  • 20:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1102.eqiad.wmnet with OS bullseye
  • 20:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1096.eqiad.wmnet with reason: host reimage
  • 20:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1100.eqiad.wmnet with OS bullseye
  • 20:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1101.eqiad.wmnet with OS bullseye
  • 20:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1099.eqiad.wmnet with OS bullseye
  • 20:54 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS buster
  • 20:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1091.eqiad.wmnet with OS bullseye
  • 20:54 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS buster
  • 20:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1096.eqiad.wmnet with reason: host reimage
  • 20:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1095.eqiad.wmnet with reason: host reimage
  • 20:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1098.eqiad.wmnet with OS bullseye
  • 20:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS buster
  • 20:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1094.eqiad.wmnet with OS bullseye
  • 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1095.eqiad.wmnet with reason: host reimage
  • 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1097.eqiad.wmnet with OS bullseye
  • 20:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1092.eqiad.wmnet with OS bullseye
  • 20:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
  • 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1096.eqiad.wmnet with OS bullseye
  • 20:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1090.eqiad.wmnet with OS bullseye
  • 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS buster
  • 20:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1091.eqiad.wmnet with reason: host reimage
  • 20:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1094.eqiad.wmnet with reason: host reimage
  • 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1092.eqiad.wmnet with reason: host reimage
  • 20:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1095.eqiad.wmnet with OS bullseye
  • 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1089.eqiad.wmnet with OS bullseye
  • 20:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1094.eqiad.wmnet with reason: host reimage
  • 20:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1090.eqiad.wmnet with reason: host reimage
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1092.eqiad.wmnet with reason: host reimage
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1091.eqiad.wmnet with reason: host reimage
  • 20:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1090.eqiad.wmnet with reason: host reimage
  • 20:26 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: security updates - bking@cumin1001 - T304938
  • 20:26 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1089.eqiad.wmnet with reason: host reimage
  • 20:24 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 20:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1089.eqiad.wmnet with reason: host reimage
  • 20:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1094.eqiad.wmnet with OS bullseye
  • 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1092.eqiad.wmnet with OS bullseye
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1091.eqiad.wmnet with OS bullseye
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1090.eqiad.wmnet with OS bullseye
  • 20:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1089.eqiad.wmnet with OS bullseye
  • 20:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1009.eqiad.wmnet
  • 20:03 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
  • 20:02 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
  • 19:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1009.eqiad.wmnet
  • 19:57 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1008.eqiad.wmnet
  • 19:57 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
  • 19:47 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 19:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 19:46 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 19:46 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 19:45 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1008.eqiad.wmnet
  • 19:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1102.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:29 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
  • 19:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1099.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1098.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1102.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1097.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1096.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1095.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:04 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1099.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1098.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1097.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1096.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1095.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T304938
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1094.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1092.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1091.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1089.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1090.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:59 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1027.eqiad.wmnet
  • 18:53 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1028.eqiad.wmnet
  • 18:48 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1027.eqiad.wmnet
  • 18:48 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1029.eqiad.wmnet
  • 18:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1094.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1092.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1091.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1090.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:43 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1028.eqiad.wmnet
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host elastic1089.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:43 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1030.eqiad.wmnet
  • 18:39 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1029.eqiad.wmnet
  • 18:38 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1031.eqiad.wmnet
  • 18:35 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1030.eqiad.wmnet
  • 18:34 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1032.eqiad.wmnet
  • 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:32 ryankemper: [Elastic] Pooled `elastic1052` (likely was erroneously left depooled after https://phabricator.wikimedia.org/P19885)
  • 18:29 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1031.eqiad.wmnet
  • 18:29 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1033.eqiad.wmnet
  • 18:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:25 razzi@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1032.eqiad.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1034.eqiad.wmnet
  • 18:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1033.eqiad.wmnet
  • 18:17 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1035.eqiad.wmnet
  • 18:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet with OS bullseye
  • 18:09 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 18:08 ryankemper: [WCQS Deploy] Successful test query placed on commons-query.wikimedia.org, there's no relevant criticals in Icinga, and Grafana looks good. WCQS deploy complete
  • 18:08 ryankemper: [WCQS Deploy] Restarted `wcqs-updater` across all hosts
  • 18:08 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1034.eqiad.wmnet
  • 18:07 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1035.eqiad.wmnet
  • 18:07 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1037.eqiad.wmnet
  • 18:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 18:02 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1036.eqiad.wmnet
  • 18:01 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@0d95eca] (wcqs): Deploy 0.3.110 to WCQS (duration: 01m 58s)
  • 18:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet with OS bullseye
  • 18:00 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.110` to `wcqs1003.eqiad.wmnet`, proceeding to rest of fleet
  • 17:59 ryankemper@deploy1002: Started deploy [wdqs/wdqs@0d95eca] (wcqs): Deploy 0.3.110 to WCQS
  • 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T305300)', diff saved to https://phabricator.wikimedia.org/P24270 and previous config saved to /var/cache/conftool/dbconfig/20220407-175730-ladsgroup.json
  • 17:52 mutante: rebooting wtp103* servers
  • 17:52 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1037.eqiad.wmnet
  • 17:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 17:50 ryankemper: T293862 Removed touched files so that it'll be easier to see when the new jvmquake threshold is crossed: `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-public' "rm -fv '/tmp/wdqs_blazegraph_jvmquake_warn_gc'"`
  • 17:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 17:46 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1036.eqiad.wmnet
  • 17:44 ryankemper: T293862 Rolling restart of wdqs public is complete; new jvmquake settings have been uptaken on wdqs public hosts: `-agentpath:/usr/lib/libjvmquake.so=1000,5,0,warn=60,touch=/tmp/wdqs_blazegraph_jvmquake_warn_gc`
  • 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P24269 and previous config saved to /var/cache/conftool/dbconfig/20220407-174224-ladsgroup.json
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 17:40 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 17:40 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 17:40 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 17:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 17:38 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@0d95eca]: 0.3.110 (duration: 06m 21s)
  • 17:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 17:32 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.110` on canary `wdqs1003`; proceeding to rest of fleet
  • 17:31 ryankemper@deploy1002: Started deploy [wdqs/wdqs@0d95eca]: 0.3.110
  • 17:31 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.110`. Pre-deploy tests passing on canary `wdqs1003`
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 17:31 ryankemper: [WDQS] T293862 Need to do a rolling restart of wdqs public; going to just roll a full deploy since it's equal work
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P24268 and previous config saved to /var/cache/conftool/dbconfig/20220407-172719-ladsgroup.json
  • 17:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1004.eqiad.wmnet with OS bullseye
  • 17:17 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1003.eqiad.wmnet with OS bullseye
  • 17:16 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 17:14 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 17:14 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T305300)', diff saved to https://phabricator.wikimedia.org/P24267 and previous config saved to /var/cache/conftool/dbconfig/20220407-171211-ladsgroup.json
  • 17:12 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 17:11 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T305300)', diff saved to https://phabricator.wikimedia.org/P24266 and previous config saved to /var/cache/conftool/dbconfig/20220407-171105-ladsgroup.json
  • 17:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24265 and previous config saved to /var/cache/conftool/dbconfig/20220407-171052-ladsgroup.json
  • 17:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 17:09 herron@cumin1001: END (FAIL) - Cookbook sre.kafka.reboot-workers (exit_code=99) for Kafka logging-codfw cluster: Reboot kafka nodes
  • 17:08 herron@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka logging-codfw cluster: Reboot kafka nodes
  • 17:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 17:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 16:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P24264 and previous config saved to /var/cache/conftool/dbconfig/20220407-165547-ladsgroup.json
  • 16:50 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T304938
  • 16:49 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T304938
  • 16:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 16:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
  • 16:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P24263 and previous config saved to /var/cache/conftool/dbconfig/20220407-164042-ladsgroup.json
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24262 and previous config saved to /var/cache/conftool/dbconfig/20220407-162537-ladsgroup.json
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24261 and previous config saved to /var/cache/conftool/dbconfig/20220407-162430-ladsgroup.json
  • 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24260 and previous config saved to /var/cache/conftool/dbconfig/20220407-162421-ladsgroup.json
  • 16:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 16:17 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P24259 and previous config saved to /var/cache/conftool/dbconfig/20220407-160916-ladsgroup.json
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P24258 and previous config saved to /var/cache/conftool/dbconfig/20220407-155410-ladsgroup.json
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24257 and previous config saved to /var/cache/conftool/dbconfig/20220407-153905-ladsgroup.json
  • 15:21 mmandere: pool cp6004 with HAProxy as TLS termination layer - T290005
  • 15:14 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1171.eqiad.wmnet with OS bullseye
  • 15:12 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS buster
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24256 and previous config saved to /var/cache/conftool/dbconfig/20220407-150640-ladsgroup.json
  • 15:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 15:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24255 and previous config saved to /var/cache/conftool/dbconfig/20220407-150632-ladsgroup.json
  • 14:59 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: host reimage
  • 14:56 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: host reimage
  • 14:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2144.codfw.wmnet with reason: Rebooting for T303174
  • 14:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2144.codfw.wmnet with reason: Rebooting for T303174
  • 14:54 kormat@cumin1001: dbctl commit (dc=all): 'db2143 (re)pooling @ 100%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P24254 and previous config saved to /var/cache/conftool/dbconfig/20220407-145455-kormat.json
  • 14:51 kormat@cumin1001: dbctl commit (dc=all): 'db2143 (re)pooling @ 50%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P24253 and previous config saved to /var/cache/conftool/dbconfig/20220407-145139-kormat.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P24252 and previous config saved to /var/cache/conftool/dbconfig/20220407-145127-ladsgroup.json
  • 14:44 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1171.eqiad.wmnet with OS bullseye
  • 14:44 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
  • 14:41 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
  • 14:36 kormat@cumin1001: dbctl commit (dc=all): 'db2143 (re)pooling @ 25%: Reboot T303174', diff saved to https://phabricator.wikimedia.org/P24251 and previous config saved to /var/cache/conftool/dbconfig/20220407-143635-kormat.json
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P24250 and previous config saved to /var/cache/conftool/dbconfig/20220407-143622-ladsgroup.json
  • 14:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2143.codfw.wmnet with reason: Rebooting for T303174
  • 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2143.codfw.wmnet with reason: Rebooting for T303174
  • 14:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2131.codfw.wmnet with reason: Rebooting for T303174
  • 14:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2131.codfw.wmnet with reason: Rebooting for T303174
  • 14:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS buster
  • 14:22 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2141.codfw.wmnet with OS bullseye
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24249 and previous config saved to /var/cache/conftool/dbconfig/20220407-142117-ladsgroup.json
  • 14:19 mmandere: depool cp6004 for reimage - T290005
  • 14:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2115.codfw.wmnet with reason: Rebooting for T303174
  • 14:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2115.codfw.wmnet with reason: Rebooting for T303174
  • 14:13 mmandere: pool cp6012 with HAProxy as TLS termination layer - T290005
  • 14:10 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS buster
  • 14:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2025.codfw.wmnet with reason: Rebooting for T303174
  • 14:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2025.codfw.wmnet with reason: Rebooting for T303174
  • 14:08 mmandere: pool cp6005 with HAProxy as TLS termination layer - T290005
  • 14:06 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS buster
  • 14:06 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2141.codfw.wmnet with reason: host reimage
  • 14:04 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T304938
  • 14:03 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2141.codfw.wmnet with reason: host reimage
  • 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2034.codfw.wmnet with reason: Rebooting for T303174
  • 14:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2034.codfw.wmnet with reason: Rebooting for T303174
  • 13:55 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1150.eqiad.wmnet with OS bullseye
  • 13:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2029.codfw.wmnet with reason: Rebooting for T303174
  • 13:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2029.codfw.wmnet with reason: Rebooting for T303174
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T305300)', diff saved to https://phabricator.wikimedia.org/P24248 and previous config saved to /var/cache/conftool/dbconfig/20220407-135052-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24247 and previous config saved to /var/cache/conftool/dbconfig/20220407-135044-ladsgroup.json
  • 13:49 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2141.codfw.wmnet with OS bullseye
  • 13:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2027.codfw.wmnet with reason: Rebooting for T303174
  • 13:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2027.codfw.wmnet with reason: Rebooting for T303174
  • 13:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 13:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2033.codfw.wmnet with reason: Rebooting for T303174
  • 13:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2033.codfw.wmnet with reason: Rebooting for T303174
  • 13:39 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: host reimage
  • 13:37 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 13:36 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: host reimage
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24246 and previous config saved to /var/cache/conftool/dbconfig/20220407-133539-ladsgroup.json
  • 13:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2031.codfw.wmnet with reason: Rebooting for T303174
  • 13:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2031.codfw.wmnet with reason: Rebooting for T303174
  • 13:33 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
  • 13:30 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
  • 13:29 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2139.codfw.wmnet with OS bullseye
  • 13:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2026.codfw.wmnet with reason: Rebooting for T303174
  • 13:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2026.codfw.wmnet with reason: Rebooting for T303174
  • 13:24 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1150.eqiad.wmnet with OS bullseye
  • 13:20 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS buster
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P24245 and previous config saved to /var/cache/conftool/dbconfig/20220407-132034-ladsgroup.json
  • 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2032.codfw.wmnet with reason: Rebooting for T303174
  • 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2032.codfw.wmnet with reason: Rebooting for T303174
  • 13:14 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1145.eqiad.wmnet with OS bullseye
  • 13:13 mmandere: depool cp6012 for reimage - T290005
  • 13:13 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2139.codfw.wmnet with reason: host reimage
  • 13:12 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS buster
  • 13:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2030.codfw.wmnet with reason: Rebooting for T303174
  • 13:10 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2139.codfw.wmnet with reason: host reimage
  • 13:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2030.codfw.wmnet with reason: Rebooting for T303174
  • 13:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubemaster2001.codfw.wmnet with reason: reimage
  • 13:08 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubemaster2001.codfw.wmnet with reason: reimage
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24244 and previous config saved to /var/cache/conftool/dbconfig/20220407-130529-ladsgroup.json
  • 13:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2028.codfw.wmnet with reason: Rebooting for T303174
  • 13:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2028.codfw.wmnet with reason: Rebooting for T303174
  • 12:58 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: host reimage
  • 12:58 mmandere: depool cp6005 for reimage - T290005
  • 12:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2104.codfw.wmnet with reason: Rebooting for T303174
  • 12:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2104.codfw.wmnet with reason: Rebooting for T303174
  • 12:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 12:57 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 12:55 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: host reimage
  • 12:55 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2139.codfw.wmnet with OS bullseye
  • 12:55 mmandere: pool cp6013 with HAProxy as TLS termination layer - T290005
  • 12:52 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS buster
  • 12:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2135.codfw.wmnet with reason: Rebooting for T303174
  • 12:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2135.codfw.wmnet with reason: Rebooting for T303174
  • 12:49 akosiaris: sudo gnt-cluster modify -H kvm:migration_downtime=3000 for ganeti01.svc.codfw.wmnet and ganeti01.svc.eqiad.wmnet to combat some logstash VM migration issues.
  • 12:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2134.codfw.wmnet with reason: Rebooting for T303174
  • 12:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2134.codfw.wmnet with reason: Rebooting for T303174
  • 12:44 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1145.eqiad.wmnet with OS bullseye
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2133.codfw.wmnet with reason: Rebooting for T303174
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2133.codfw.wmnet with reason: Rebooting for T303174
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2078,2133].codfw.wmnet with reason: Rebooting primary T303174
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2078,2133].codfw.wmnet with reason: Rebooting primary T303174
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2132.codfw.wmnet with reason: Rebooting for T303174
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2132.codfw.wmnet with reason: Rebooting for T303174
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2078,2132].codfw.wmnet with reason: Rebooting primary T303174
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2078,2132].codfw.wmnet with reason: Rebooting primary T303174
  • 12:32 mmandere: pool cp3051 with HAProxy as TLS termination layer - T290005
  • 12:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3051.esams.wmnet with OS buster
  • 12:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2096.codfw.wmnet with reason: Rebooting for T303174
  • 12:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2096.codfw.wmnet with reason: Rebooting for T303174
  • 12:19 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:30:00 on db2096.codfw.wmnet with reason: Rebooting for T303174
  • 12:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2096.codfw.wmnet with reason: Rebooting for T303174
  • 12:08 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 12:06 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T305300)', diff saved to https://phabricator.wikimedia.org/P24243 and previous config saved to /var/cache/conftool/dbconfig/20220407-120514-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24242 and previous config saved to /var/cache/conftool/dbconfig/20220407-120507-ladsgroup.json
  • 12:03 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 12:03 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24241 and previous config saved to /var/cache/conftool/dbconfig/20220407-115002-ladsgroup.json
  • 11:49 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1140.eqiad.wmnet with OS bullseye
  • 11:46 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2101.codfw.wmnet with OS bullseye
  • 11:45 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS buster
  • 11:35 mmandere: depool cp6013 for reimage - T290005
  • 11:35 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1140.eqiad.wmnet with reason: host reimage
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P24240 and previous config saved to /var/cache/conftool/dbconfig/20220407-113455-ladsgroup.json
  • 11:34 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3051.esams.wmnet with OS buster
  • 11:32 jforrester@deploy1002: Finished deploy [integration/docroot@d88e2fa]: d88e2fa19fd6 [WikiLambda] Fix link typo and re-group/re-word other links (duration: 00m 09s)
  • 11:32 jforrester@deploy1002: Started deploy [integration/docroot@d88e2fa]: d88e2fa19fd6 [WikiLambda] Fix link typo and re-group/re-word other links
  • 11:31 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2101.codfw.wmnet with reason: host reimage
  • 11:31 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1140.eqiad.wmnet with reason: host reimage
  • 11:28 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2101.codfw.wmnet with reason: host reimage
  • 11:23 mmandere: depool cp3051 for reimage - T290005
  • 11:23 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1140.eqiad.wmnet with OS bullseye
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24239 and previous config saved to /var/cache/conftool/dbconfig/20220407-111950-ladsgroup.json
  • 11:17 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2101.codfw.wmnet with OS bullseye
  • 11:17 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1139.eqiad.wmnet with OS bullseye
  • 11:16 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:15 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:12 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2100.codfw.wmnet with OS bullseye
  • 11:03 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1139.eqiad.wmnet with reason: host reimage
  • 10:59 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1139.eqiad.wmnet with reason: host reimage
  • 10:59 mmandere: pool cp3053 with HAProxy as TLS termination layer - T290005
  • 10:58 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2100.codfw.wmnet with reason: host reimage
  • 10:55 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2100.codfw.wmnet with reason: host reimage
  • 10:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3053.esams.wmnet with OS buster
  • 10:51 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS bullseye
  • 10:45 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:44 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2100.codfw.wmnet with OS bullseye
  • 10:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:41 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1116.eqiad.wmnet with OS bullseye
  • 10:40 mmandere: pool cp6006 with HAProxy as TLS termination layer - T290005
  • 10:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubemaster2002.codfw.wmnet with reason: reimage
  • 10:37 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubemaster2002.codfw.wmnet with reason: reimage
  • 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24238 and previous config saved to /var/cache/conftool/dbconfig/20220407-103739-ladsgroup.json
  • 10:37 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS buster
  • 10:36 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:36 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:35 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 10:35 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24237 and previous config saved to /var/cache/conftool/dbconfig/20220407-102821-root.json
  • 10:28 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: host reimage
  • 10:27 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2099.codfw.wmnet with OS bullseye
  • 10:25 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:24 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: host reimage
  • 10:24 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24236 and previous config saved to /var/cache/conftool/dbconfig/20220407-102234-ladsgroup.json
  • 10:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T305300)', diff saved to https://phabricator.wikimedia.org/P24235 and previous config saved to /var/cache/conftool/dbconfig/20220407-101936-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T305300)', diff saved to https://phabricator.wikimedia.org/P24234 and previous config saved to /var/cache/conftool/dbconfig/20220407-101928-ladsgroup.json
  • 10:16 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1116.eqiad.wmnet with OS bullseye
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24233 and previous config saved to /var/cache/conftool/dbconfig/20220407-101318-root.json
  • 10:12 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2099.codfw.wmnet with reason: host reimage
  • 10:09 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2099.codfw.wmnet with reason: host reimage
  • 10:08 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 10:08 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1102.eqiad.wmnet with OS bullseye
  • 10:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1007.eqiad.wmnet
  • 10:08 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 10:07 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24232 and previous config saved to /var/cache/conftool/dbconfig/20220407-100729-ladsgroup.json
  • 10:06 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 10:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:05 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:04 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P24231 and previous config saved to /var/cache/conftool/dbconfig/20220407-100423-ladsgroup.json
  • 10:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
  • 10:00 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
  • 10:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
  • 09:58 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2099.codfw.wmnet with OS bullseye
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24230 and previous config saved to /var/cache/conftool/dbconfig/20220407-095814-root.json
  • 09:56 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24229 and previous config saved to /var/cache/conftool/dbconfig/20220407-095624-root.json
  • 09:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1007.eqiad.wmnet
  • 09:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1102.eqiad.wmnet with reason: host reimage
  • 09:54 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24228 and previous config saved to /var/cache/conftool/dbconfig/20220407-095224-ladsgroup.json
  • 09:51 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1102.eqiad.wmnet with reason: host reimage
  • 09:50 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:50 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P24227 and previous config saved to /var/cache/conftool/dbconfig/20220407-094917-ladsgroup.json
  • 09:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:43 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1102.eqiad.wmnet with OS bullseye
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P24226 and previous config saved to /var/cache/conftool/dbconfig/20220407-094310-root.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24225 and previous config saved to /var/cache/conftool/dbconfig/20220407-094120-root.json
  • 09:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2129.codfw.wmnet with reason: Rebooting for T303174
  • 09:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2129.codfw.wmnet with reason: Rebooting for T303174
  • 09:39 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS buster
  • 09:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 09:37 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 09:35 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2098.codfw.wmnet with OS bullseye
  • 09:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 09:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Rebooting primary T303174
  • 09:34 mmandere: depool cp6006 for reimage - T290005
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T305300)', diff saved to https://phabricator.wikimedia.org/P24224 and previous config saved to /var/cache/conftool/dbconfig/20220407-093412-ladsgroup.json
  • 09:33 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS buster
  • 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2123.codfw.wmnet with reason: Rebooting for T303174
  • 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2123.codfw.wmnet with reason: Rebooting for T303174
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24223 and previous config saved to /var/cache/conftool/dbconfig/20220407-092616-root.json
  • 09:25 mmandere: depool cp3053 for reimage - T290005
  • 09:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2105.codfw.wmnet with reason: Rebooting for T303174
  • 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2105.codfw.wmnet with reason: Rebooting for T303174
  • 09:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Rebooting primary T303174
  • 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Rebooting primary T303174
  • 09:20 mmandere: pool cp6014 with HAProxy as TLS termination layer - T290005
  • 09:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS buster
  • 09:14 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
  • 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2152.codfw.wmnet with reason: Rebooting for T303174
  • 09:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2152.codfw.wmnet with reason: Rebooting for T303174
  • 09:11 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: host reimage
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P24222 and previous config saved to /var/cache/conftool/dbconfig/20220407-091112-root.json
  • 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2150.codfw.wmnet with reason: Rebooting for T303174
  • 09:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2150.codfw.wmnet with reason: Rebooting for T303174
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T305300)', diff saved to https://phabricator.wikimedia.org/P24221 and previous config saved to /var/cache/conftool/dbconfig/20220407-090201-ladsgroup.json
  • 09:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:01 mmandere: pool cp3050 with HAProxy as TLS termination layer - T290005
  • 09:00 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2098.codfw.wmnet with OS bullseye
  • 08:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2122.codfw.wmnet with reason: Rebooting for T303174
  • 08:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2122.codfw.wmnet with reason: Rebooting for T303174
  • 08:56 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3050.esams.wmnet with OS buster
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P24220 and previous config saved to /var/cache/conftool/dbconfig/20220407-085608-root.json
  • 08:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24219 and previous config saved to /var/cache/conftool/dbconfig/20220407-084140-marostegui.json
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P24218 and previous config saved to /var/cache/conftool/dbconfig/20220407-084103-root.json
  • 08:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24217 and previous config saved to /var/cache/conftool/dbconfig/20220407-083209-marostegui.json
  • 08:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 08:27 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
  • 08:26 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 08:23 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
  • 08:23 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24216 and previous config saved to /var/cache/conftool/dbconfig/20220407-081910-ladsgroup.json
  • 08:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P24215 and previous config saved to /var/cache/conftool/dbconfig/20220407-081704-marostegui.json
  • 08:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:13 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.6 refs T305212
  • 08:09 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS buster
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P24214 and previous config saved to /var/cache/conftool/dbconfig/20220407-080159-marostegui.json
  • 08:00 mmandere: depool cp6014 for reimage - T290005
  • 07:55 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3050.esams.wmnet with OS buster
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24213 and previous config saved to /var/cache/conftool/dbconfig/20220407-074654-marostegui.json
  • 07:44 mmandere: depool cp3050 for reimage - T290005
  • 07:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1163', diff saved to https://phabricator.wikimedia.org/P24212 and previous config saved to /var/cache/conftool/dbconfig/20220407-073013-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300775)', diff saved to https://phabricator.wikimedia.org/P24211 and previous config saved to /var/cache/conftool/dbconfig/20220407-072813-marostegui.json
  • 07:17 hashar: CI and Gerrit are back up
  • 07:14 hashar: gerrit1001.wikimedia.org: restarted apache2 service
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24210 and previous config saved to /var/cache/conftool/dbconfig/20220407-071308-marostegui.json
  • 07:10 hashar: Restarting contint2001.wikimedia.Org
  • 07:10 hashar: Restarting gerrit1001.wikimedia.org
  • 07:02 hashar: Restarting contint1001.wikimedia.org
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24209 and previous config saved to /var/cache/conftool/dbconfig/20220407-065803-marostegui.json
  • 06:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 06:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 06:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300775)', diff saved to https://phabricator.wikimedia.org/P24208 and previous config saved to /var/cache/conftool/dbconfig/20220407-064258-marostegui.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24207 and previous config saved to /var/cache/conftool/dbconfig/20220407-062736-marostegui.json
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24206 and previous config saved to /var/cache/conftool/dbconfig/20220407-062728-marostegui.json
  • 06:27 ryankemper: [Elastic] Manually restarted elasticsearch exporters on `elastic2043` and `elastic2058`
  • 06:25 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - ryankemper@cumin1001 - T304938
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P24205 and previous config saved to /var/cache/conftool/dbconfig/20220407-061223-marostegui.json
  • 06:00 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - ryankemper@cumin1001 - T304938
  • 05:58 ryankemper: [Elastic] Manually restarted elasticsearch exporters on `cloudelastic1004` and `elastic2054`
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P24203 and previous config saved to /var/cache/conftool/dbconfig/20220407-055718-marostegui.json
  • 05:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - ryankemper@cumin1001 - T304938
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24202 and previous config saved to /var/cache/conftool/dbconfig/20220407-054213-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2076 db2086:3317 db2086:3318 db2107 db2137:3314 db2137:3315 db2143 db2147 es2029 es2030 T305469', diff saved to https://phabricator.wikimedia.org/P24201 and previous config saved to /var/cache/conftool/dbconfig/20220407-050149-root.json
  • 04:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24200 and previous config saved to /var/cache/conftool/dbconfig/20220407-044158-marostegui.json
  • 04:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 04:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 04:29 ryankemper: [Elastic] for future reference, we still need to fix the fact that we haven't told systemd that the prometheus-wmf-elasticsearch exporters need to start after the actual elasticsearch service
  • 04:13 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - ryankemper@cumin1001 - T304938
  • 04:13 ryankemper: [Elastic] Beginning rolling reboot of codfw elastic to apply kernel security updates: `ryankemper@cumin1001:~$ sudo -E cookbook sre.elasticsearch.rolling-operation search_codfw "codfw cluster reboot" --reboot --nodes-per-run 3 --start-datetime 2022-04-07T04:09:05 --task-id T304938`
  • 02:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24199 and previous config saved to /var/cache/conftool/dbconfig/20220407-024347-marostegui.json
  • 02:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P24198 and previous config saved to /var/cache/conftool/dbconfig/20220407-022842-marostegui.json
  • 02:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P24197 and previous config saved to /var/cache/conftool/dbconfig/20220407-021337-marostegui.json
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24196 and previous config saved to /var/cache/conftool/dbconfig/20220407-015832-marostegui.json
  • 00:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T297189)', diff saved to https://phabricator.wikimedia.org/P24195 and previous config saved to /var/cache/conftool/dbconfig/20220407-005817-marostegui.json
  • 00:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T297189)', diff saved to https://phabricator.wikimedia.org/P24194 and previous config saved to /var/cache/conftool/dbconfig/20220407-005809-marostegui.json
  • 00:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P24193 and previous config saved to /var/cache/conftool/dbconfig/20220407-004304-marostegui.json
  • 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P24192 and previous config saved to /var/cache/conftool/dbconfig/20220407-002759-marostegui.json
  • 00:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T297189)', diff saved to https://phabricator.wikimedia.org/P24191 and previous config saved to /var/cache/conftool/dbconfig/20220407-001254-marostegui.json

2022-04-06

  • 23:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:47 krinkle@deploy1002: Synchronized w/static.php: Ic87a8a3d00db (duration: 00m 53s)
  • 23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T297189)', diff saved to https://phabricator.wikimedia.org/P24190 and previous config saved to /var/cache/conftool/dbconfig/20220406-232126-marostegui.json
  • 23:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 23:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24189 and previous config saved to /var/cache/conftool/dbconfig/20220406-232118-marostegui.json
  • 23:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:10 krinkle@deploy1002: Synchronized w/static: I5a05f4728 (duration: 00m 54s)
  • 23:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P24188 and previous config saved to /var/cache/conftool/dbconfig/20220406-230613-marostegui.json
  • 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T300775)', diff saved to https://phabricator.wikimedia.org/P24187 and previous config saved to /var/cache/conftool/dbconfig/20220406-230118-marostegui.json
  • 23:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 23:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 23:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300775)', diff saved to https://phabricator.wikimedia.org/P24186 and previous config saved to /var/cache/conftool/dbconfig/20220406-230110-marostegui.json
  • 22:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P24185 and previous config saved to /var/cache/conftool/dbconfig/20220406-225108-marostegui.json
  • 22:49 mutante: parse2004, parse2003 - rebooting
  • 22:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24184 and previous config saved to /var/cache/conftool/dbconfig/20220406-224605-marostegui.json
  • 22:42 mutante: parse2006, parse2005 - rebooting
  • 22:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24183 and previous config saved to /var/cache/conftool/dbconfig/20220406-223603-marostegui.json
  • 22:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 22:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 22:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24182 and previous config saved to /var/cache/conftool/dbconfig/20220406-223100-marostegui.json
  • 22:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 22:26 mutante: parse2007, parse2008 - rebooting
  • 22:16 mutante: parse2009, parse2010 - rebooting
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300775)', diff saved to https://phabricator.wikimedia.org/P24181 and previous config saved to /var/cache/conftool/dbconfig/20220406-221555-marostegui.json
  • 22:14 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 22:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
  • 22:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
  • 22:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
  • 22:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 22:05 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 22:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
  • 22:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
  • 22:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
  • 21:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:57 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:57 mutante: parse2011, parse2012 - rebooting
  • 21:51 mutante: parse2013, parse2014 - rebooting
  • 21:46 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1039.eqiad.wmnet
  • 21:42 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@a1c5c6f]: (no justification provided) (duration: 04m 34s)
  • 21:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 21:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 21:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 21:38 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@a1c5c6f]: (no justification provided)
  • 21:37 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: security updates - bking@cumin1001 - T304938
  • 21:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24180 and previous config saved to /var/cache/conftool/dbconfig/20220406-213605-marostegui.json
  • 21:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24179 and previous config saved to /var/cache/conftool/dbconfig/20220406-213557-marostegui.json
  • 21:35 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1039.eqiad.wmnet
  • 21:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1038.eqiad.wmnet
  • 21:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:26 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1038.eqiad.wmnet
  • 21:26 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1037.eqiad.wmnet
  • 21:21 mutante: wtp1037,wtp1038,wtp1039 - rebooting sequentially
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P24178 and previous config saved to /var/cache/conftool/dbconfig/20220406-212052-marostegui.json
  • 21:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1037.eqiad.wmnet
  • 21:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1037.wmnet
  • 21:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=wtp1037.wmnet
  • 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P24177 and previous config saved to /var/cache/conftool/dbconfig/20220406-210545-marostegui.json
  • 21:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 21:03 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 20:56 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 20:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: security updates - bking@cumin1001 - T304938
  • 20:51 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 20:50 cjming: end of UTC late backport & config window
  • 20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24176 and previous config saved to /var/cache/conftool/dbconfig/20220406-205040-marostegui.json
  • 20:46 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 20:38 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:38 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:38 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:35 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security updates - bking@cumin1001 - T304938
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 cjming@deploy1002: Synchronized php-1.39.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents/desktopWebUIActions.js: Backport: Update to 78eef14, rename viewportSize to viewportSizeBucket (T301391) (duration: 00m 55s)
  • 20:03 mutante: phabricator about to be rebooted - hang on
  • 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24174 and previous config saved to /var/cache/conftool/dbconfig/20220406-195925-marostegui.json
  • 19:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T297189)', diff saved to https://phabricator.wikimedia.org/P24173 and previous config saved to /var/cache/conftool/dbconfig/20220406-195917-marostegui.json
  • 19:59 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: reboot for maintenance
  • 19:58 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: reboot for maintenance
  • 19:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 19:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 19:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 19:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 19:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P24172 and previous config saved to /var/cache/conftool/dbconfig/20220406-194412-marostegui.json
  • 19:31 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 19:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 19:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P24171 and previous config saved to /var/cache/conftool/dbconfig/20220406-192907-marostegui.json
  • 19:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:23 rook@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudvirt1016.eqiad.wmnet
  • 19:23 rook@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1016.eqiad.wmnet
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T297189)', diff saved to https://phabricator.wikimedia.org/P24170 and previous config saved to /var/cache/conftool/dbconfig/20220406-191402-marostegui.json
  • 19:13 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 19:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host dse-k8s-worker1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host dse-k8s-worker1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host dse-k8s-worker1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host dse-k8s-worker1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T297189)', diff saved to https://phabricator.wikimedia.org/P24169 and previous config saved to /var/cache/conftool/dbconfig/20220406-183927-marostegui.json
  • 18:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 18:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T297189)', diff saved to https://phabricator.wikimedia.org/P24168 and previous config saved to /var/cache/conftool/dbconfig/20220406-183919-marostegui.json
  • 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P24167 and previous config saved to /var/cache/conftool/dbconfig/20220406-182414-marostegui.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P24166 and previous config saved to /var/cache/conftool/dbconfig/20220406-180909-marostegui.json
  • 18:01 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 17:58 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T297189)', diff saved to https://phabricator.wikimedia.org/P24165 and previous config saved to /var/cache/conftool/dbconfig/20220406-175403-marostegui.json
  • 17:42 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 17:25 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2097.codfw.wmnet with OS bullseye
  • 17:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2120.codfw.wmnet with reason: Rebooting for T303174
  • 17:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2120.codfw.wmnet with reason: Rebooting for T303174
  • 17:11 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2097.codfw.wmnet with reason: host reimage
  • 17:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2118.codfw.wmnet with reason: Rebooting for T303174
  • 17:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2118.codfw.wmnet with reason: Rebooting for T303174
  • 17:08 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2097.codfw.wmnet with reason: host reimage
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T297189)', diff saved to https://phabricator.wikimedia.org/P24164 and previous config saved to /var/cache/conftool/dbconfig/20220406-170223-marostegui.json
  • 17:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:01 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2108.codfw.wmnet with reason: Rebooting for T303174
  • 17:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2108.codfw.wmnet with reason: Rebooting for T303174
  • 16:57 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2097.codfw.wmnet with OS bullseye
  • 16:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2124.codfw.wmnet with reason: Rebooting for T303174
  • 16:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2124.codfw.wmnet with reason: Rebooting for T303174
  • 16:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2117.codfw.wmnet with reason: Rebooting for T303174
  • 16:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2117.codfw.wmnet with reason: Rebooting for T303174
  • 16:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2114.codfw.wmnet with reason: Rebooting for T303174
  • 16:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2114.codfw.wmnet with reason: Rebooting for T303174
  • 16:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2128.codfw.wmnet with reason: Rebooting for T303174
  • 16:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2128.codfw.wmnet with reason: Rebooting for T303174
  • 16:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2113.codfw.wmnet with reason: Rebooting for T303174
  • 16:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2113.codfw.wmnet with reason: Rebooting for T303174
  • 16:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2111.codfw.wmnet with reason: Rebooting for T303174
  • 16:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2111.codfw.wmnet with reason: Rebooting for T303174
  • 16:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2147.codfw.wmnet with reason: Rebooting for T303174
  • 16:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2147.codfw.wmnet with reason: Rebooting for T303174
  • 16:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:02 mforns@deploy1002: Finished deploy [airflow-dags/analytics@b029f10]: (no justification provided) (duration: 00m 08s)
  • 16:02 mforns@deploy1002: Started deploy [airflow-dags/analytics@b029f10]: (no justification provided)
  • 15:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2140.codfw.wmnet with reason: Rebooting for T303174
  • 15:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2140.codfw.wmnet with reason: Rebooting for T303174
  • 15:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc2040.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2040.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:51 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3018fdb]: (no justification provided) (duration: 00m 07s)
  • 15:51 mforns@deploy1002: Started deploy [airflow-dags/analytics@3018fdb]: (no justification provided)
  • 15:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2137.codfw.wmnet with reason: Rebooting for T303174
  • 15:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2137.codfw.wmnet with reason: Rebooting for T303174
  • 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2136.codfw.wmnet with reason: Rebooting for T303174
  • 15:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2136.codfw.wmnet with reason: Rebooting for T303174
  • 15:33 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2119.codfw.wmnet with reason: Rebooting for T303174
  • 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2119.codfw.wmnet with reason: Rebooting for T303174
  • 15:31 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:29 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:28 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:15 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:11 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:07 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:07 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:07 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:06 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:06 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@3018fdb]: (no justification provided) (duration: 00m 07s)
  • 15:06 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:06 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@3018fdb]: (no justification provided)
  • 15:06 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:04 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:04 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:02 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1006.eqiad.wmnet
  • 15:02 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:01 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@dc748fb]: (no justification provided) (duration: 00m 08s)
  • 15:01 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@dc748fb]: (no justification provided)
  • 14:58 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:58 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:57 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:57 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:57 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:57 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:55 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:55 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:55 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2106.codfw.wmnet with reason: Rebooting for T303174
  • 14:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2106.codfw.wmnet with reason: Rebooting for T303174
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:52 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:52 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1006.eqiad.wmnet
  • 14:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2149.codfw.wmnet with reason: Rebooting for T303174
  • 14:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2149.codfw.wmnet with reason: Rebooting for T303174
  • 14:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2127.codfw.wmnet with reason: Rebooting for T303174
  • 14:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2127.codfw.wmnet with reason: Rebooting for T303174
  • 14:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 14:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 14:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24163 and previous config saved to /var/cache/conftool/dbconfig/20220406-143647-marostegui.json
  • 14:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2109.codfw.wmnet with reason: Rebooting for T303174
  • 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2109.codfw.wmnet with reason: Rebooting for T303174
  • 14:27 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1005.eqiad.wmnet
  • 14:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2148.codfw.wmnet with reason: Rebooting for T303174
  • 14:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2148.codfw.wmnet with reason: Rebooting for T303174
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P24162 and previous config saved to /var/cache/conftool/dbconfig/20220406-142142-marostegui.json
  • 14:21 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1005.eqiad.wmnet
  • 14:15 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:08 mmandere: pool cp4021 with HAProxy as TLS termination layer - T290005
  • 14:06 moritzm: installing webperf2004 T305460
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P24160 and previous config saved to /var/cache/conftool/dbconfig/20220406-140637-marostegui.json
  • 14:05 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:02 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host aqs1004.eqiad.wmnet
  • 14:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
  • 13:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2138.codfw.wmnet with reason: Rebooting for T303174
  • 13:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2138.codfw.wmnet with reason: Rebooting for T303174
  • 13:55 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:55 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:54 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:53 moritzm: installing webperf2003 T305460
  • 13:52 kart_: UTC afternoon backport window - Done.
  • 13:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs1004.eqiad.wmnet
  • 13:51 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Rearrange zh namespace names and namespace aliases (T286291 T298308) (duration: 00m 53s)
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24159 and previous config saved to /var/cache/conftool/dbconfig/20220406-135132-marostegui.json
  • 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2126.codfw.wmnet with reason: Rebooting for T303174
  • 13:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2126.codfw.wmnet with reason: Rebooting for T303174
  • 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2125.codfw.wmnet with reason: Rebooting for T303174
  • 13:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2125.codfw.wmnet with reason: Rebooting for T303174
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:36 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:36 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:34 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4021.ulsfo.wmnet with reason: host reimage
  • 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2107.codfw.wmnet with reason: Rebooting for T303174
  • 13:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2107.codfw.wmnet with reason: Rebooting for T303174
  • 13:31 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4021.ulsfo.wmnet with reason: host reimage
  • 13:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:29 kartik@deploy1002: Synchronized php-1.39.0-wmf.6/extensions/Translate/tag/PageTranslationHooks.php: Backport: Revert "PageTranslationHooks: Don't kick in during interface message parsing" (T305531) (duration: 00m 57s)
  • 13:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:26 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2095.codfw.wmnet with reason: Rebooting for T303174
  • 13:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2095.codfw.wmnet with reason: Rebooting for T303174
  • 13:20 kartik@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Start writing to $wmgUsingKubernetes the same value as to $wmfUsingKubernetes (T45956) (duration: 00m 55s)
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2094.codfw.wmnet with reason: Rebooting for T303174
  • 13:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2094.codfw.wmnet with reason: Rebooting for T303174
  • 13:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
  • 13:15 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:15 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
  • 13:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
  • 13:11 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:11 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:11 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:10 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:07 mmandere: depool cp4021 for reimage - T290005
  • 13:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T297189)', diff saved to https://phabricator.wikimedia.org/P24158 and previous config saved to /var/cache/conftool/dbconfig/20220406-125117-marostegui.json
  • 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:45 mmandere: pool cp4033 with HAProxy as TLS termination layer - T290005
  • 12:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T300775)', diff saved to https://phabricator.wikimedia.org/P24157 and previous config saved to /var/cache/conftool/dbconfig/20220406-123603-marostegui.json
  • 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1163', diff saved to https://phabricator.wikimedia.org/P24156 and previous config saved to /var/cache/conftool/dbconfig/20220406-123505-root.json
  • 12:35 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24152 and previous config saved to /var/cache/conftool/dbconfig/20220406-121222-ladsgroup.json
  • 12:11 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 12:10 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:09 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24151 and previous config saved to /var/cache/conftool/dbconfig/20220406-115717-ladsgroup.json
  • 11:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4033.ulsfo.wmnet with reason: host reimage
  • 11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 11:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4033.ulsfo.wmnet with reason: host reimage
  • 11:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 11:37 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:35 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:32 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4033.ulsfo.wmnet with OS buster
  • 11:32 moritzm: installing wavpack security updates
  • 11:24 mmandere: depool cp4033 for reimage - T290005
  • 11:23 marostegui: dbmaint s3@eqiad T297189
  • 11:23 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 11:22 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:20 mmandere: pool cp4027 with HAProxy as TLS termination layer - T290005
  • 11:12 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4027.ulsfo.wmnet with OS buster
  • 11:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:03 mmandere: pool cp3052 with HAProxy as TLS termination layer - T290005
  • 11:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3052.esams.wmnet with OS buster
  • 11:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:57 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24150 and previous config saved to /var/cache/conftool/dbconfig/20220406-103929-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:32 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
  • 10:30 jynus: reruning es4 dump on backup2002
  • 10:29 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4027.ulsfo.wmnet with reason: host reimage
  • 10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:28 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
  • 10:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host deploy2002.codfw.wmnet
  • 10:25 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4027.ulsfo.wmnet with reason: host reimage
  • 10:24 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 10:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
  • 10:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:10 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4027.ulsfo.wmnet with OS buster
  • 10:07 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 10:03 mmandere: depool cp4027 for reimage - T290005
  • 10:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 09:58 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS buster
  • 09:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui1001.eqiad.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host xhgui1001.eqiad.wmnet
  • 09:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 09:51 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 09:50 mmandere: depool cp3052 for reimage - T290005
  • 09:47 moritzm: installing mariadb-10.3 updates from buster 10.12 point released (different from wmf-mariadb packages)
  • 09:44 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:24 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1002.eqiad.wmnet with OS bullseye
  • 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host webperf1003.eqiad.wmnet
  • 09:19 btullis@cumin1001: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto analytics cluster: Reboot Presto nodes
  • 09:17 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:17 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:15 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:15 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1001.eqiad.wmnet
  • 09:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet with reason: Rebooting for T303174
  • 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet with reason: Rebooting for T303174
  • 09:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es[2023-2025].codfw.wmnet with reason: Rebooting es2023 T303174
  • 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es[2023-2025].codfw.wmnet with reason: Rebooting es2023 T303174
  • 09:08 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24148 and previous config saved to /var/cache/conftool/dbconfig/20220406-090449-ladsgroup.json
  • 09:04 arturo: force-started update-openstack-mirror.service on mirror1001 for python3-eventlet (T305157)
  • 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1001.eqiad.wmnet
  • 09:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1002.eqiad.wmnet
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 08:38 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 08:35 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:35 ayounsi@cumin2002: START - Cookbook sre.network.cf
  • 08:34 jnuche@deploy1002: deploy-promote aborted: (duration: 00m 40s)
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24146 and previous config saved to /var/cache/conftool/dbconfig/20220406-083439-ladsgroup.json
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host webperf1004.eqiad.wmnet
  • 08:27 mmandere: pool cp4035 with HAProxy as TLS termination layer - T290005
  • 08:23 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:21 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cp5001.eqsin.wmnet with OS buster
  • 08:20 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4035.ulsfo.wmnet with OS buster
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24145 and previous config saved to /var/cache/conftool/dbconfig/20220406-081934-ladsgroup.json
  • 08:18 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:10 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Add zh-hans and zh-hant translation of Module and Module_talk aliases" (T286291 T298308 T165593 T286105) (duration: 00m 56s)
  • 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:07 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host webperf2004.codfw.wmnet
  • 07:56 kharlan@deploy1002: Synchronized wmf-config: Config: GrowthExperiments: Add mailing list question for eswiki (T303240 T305015) (duration: 00m 56s)
  • 07:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4035.ulsfo.wmnet with reason: host reimage
  • 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:43 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4035.ulsfo.wmnet with reason: host reimage
  • 07:40 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5001.eqsin.wmnet with reason: host reimage
  • 07:38 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:38 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host webperf2004.codfw.wmnet
  • 07:36 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5001.eqsin.wmnet with reason: host reimage
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host webperf2003.codfw.wmnet
  • 07:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2002.wikimedia.org
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
  • 07:28 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4035.ulsfo.wmnet with OS buster
  • 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
  • 07:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:21 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:20 mmandere: depool cp4035 for reimage - T290005
  • 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:12 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5001.eqsin.wmnet with OS buster
  • 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:04 mmandere: depool cp5001 for reimage - T290005
  • 07:03 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:03 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host webperf2003.codfw.wmnet
  • 06:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24144 and previous config saved to /var/cache/conftool/dbconfig/20220406-064633-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 05:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 05:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 03:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 03:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:59 ejegg: updated civicrm from 87bc3114 to 7b7b284d
  • 02:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 02:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 02:38 cstone: payments-wiki revision changed from 6f888c28 to 4e42d75f
  • 01:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24143 and previous config saved to /var/cache/conftool/dbconfig/20220406-014925-ladsgroup.json
  • 01:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24142 and previous config saved to /var/cache/conftool/dbconfig/20220406-013420-ladsgroup.json
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24141 and previous config saved to /var/cache/conftool/dbconfig/20220406-011915-ladsgroup.json
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24140 and previous config saved to /var/cache/conftool/dbconfig/20220406-010410-ladsgroup.json

2022-04-05

  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24139 and previous config saved to /var/cache/conftool/dbconfig/20220405-233042-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 22:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24138 and previous config saved to /var/cache/conftool/dbconfig/20220405-224352-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24137 and previous config saved to /var/cache/conftool/dbconfig/20220405-222847-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24136 and previous config saved to /var/cache/conftool/dbconfig/20220405-221342-ladsgroup.json
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24135 and previous config saved to /var/cache/conftool/dbconfig/20220405-215837-ladsgroup.json
  • 21:21 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fd8b410] (duration: 06m 48s)
  • 21:14 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fd8b410]
  • 21:14 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410] (thin): Regular analytics weekly train THIN [analytics/refinery@fd8b410] (duration: 00m 10s)
  • 21:14 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410] (thin): Regular analytics weekly train THIN [analytics/refinery@fd8b410]
  • 21:13 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410]: Regular analytics weekly train [analytics/refinery@fd8b410] (duration: 22m 50s)
  • 21:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6014.drmrs.wmnet
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24133 and previous config saved to /var/cache/conftool/dbconfig/20220405-205822-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 20:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6014.drmrs.wmnet
  • 20:50 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410]: Regular analytics weekly train [analytics/refinery@fd8b410]
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 urbanecm: UTC late B&C window done
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mutante: puppetmaster1001 - running test downloads of geoip databases to a temp dir
  • 20:47 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 8ea8634: Change upload dialog automatic upload comments (T305303) (duration: 00m 54s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:41 razzi: deploying refinery for https://gerrit.wikimedia.org/r/c/analytics/refinery/+/776269/
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6013.drmrs.wmnet
  • 20:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 10c16c5: [config]: Undeploy GDI survey from EN,FR and ES wikis in PROD (T303962) (duration: 00m 55s)
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6013.drmrs.wmnet
  • 20:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6012.drmrs.wmnet
  • 20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24132 and previous config saved to /var/cache/conftool/dbconfig/20220405-201315-ladsgroup.json
  • 20:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6012.drmrs.wmnet
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24131 and previous config saved to /var/cache/conftool/dbconfig/20220405-195810-ladsgroup.json
  • 19:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6011.drmrs.wmnet
  • 19:49 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw(1307|1308|1309|1310|1311|1318|1334|1335|1336|1337).*
  • 19:46 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6011.drmrs.wmnet
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24130 and previous config saved to /var/cache/conftool/dbconfig/20220405-194305-ladsgroup.json
  • 19:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6010.drmrs.wmnet
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6010.drmrs.wmnet
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24129 and previous config saved to /var/cache/conftool/dbconfig/20220405-192800-ladsgroup.json
  • 19:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6009.drmrs.wmnet
  • 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6009.drmrs.wmnet
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6006.drmrs.wmnet
  • 18:47 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6006.drmrs.wmnet
  • 18:42 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 18:41 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 18:37 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 18:34 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.1-1+deb11u1_amd64.changes
  • 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6005.drmrs.wmnet
  • 18:28 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.1-1_amd64.changes # T299705
  • 18:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2015.codfw.wmnet
  • 18:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2016.codfw.wmnet
  • 18:25 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2017.codfw.wmnet
  • 18:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2018.codfw.wmnet
  • 18:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6005.drmrs.wmnet
  • 18:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2019.codfw.wmnet
  • 18:23 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2015.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2020.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2016.codfw.wmnet
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24128 and previous config saved to /var/cache/conftool/dbconfig/20220405-181712-ladsgroup.json
  • 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24127 and previous config saved to /var/cache/conftool/dbconfig/20220405-181658-ladsgroup.json
  • 18:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6004.drmrs.wmnet
  • 18:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6004.drmrs.wmnet
  • 18:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24126 and previous config saved to /var/cache/conftool/dbconfig/20220405-180153-ladsgroup.json
  • 18:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 17:59 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parse2020.codfw.wmnet
  • 17:59 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 17:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2020.codfw.wmnet
  • 17:58 mutante: rebooting hosts in the parse201* range, starting with parse2019, counting down
  • 17:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6003.drmrs.wmnet
  • 17:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
  • 17:56 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 17:54 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host parse2020.codfw.wmnet
  • 17:53 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 17:52 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse201[7-9].codfw.wmnet
  • 17:51 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse201[7-9].wmnet
  • 17:51 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2020.wmnet
  • 17:49 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6003.drmrs.wmnet
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4035.ulsfo.wmnet
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6002.drmrs.wmnet
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24125 and previous config saved to /var/cache/conftool/dbconfig/20220405-174648-ladsgroup.json
  • 17:40 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6002.drmrs.wmnet
  • 17:40 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4035.ulsfo.wmnet
  • 17:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4033.ulsfo.wmnet
  • 17:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1040.eqiad.wmnet
  • 17:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1041.eqiad.wmnet
  • 17:32 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1040.eqiad.wmnet
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24124 and previous config saved to /var/cache/conftool/dbconfig/20220405-173143-ladsgroup.json
  • 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6001.drmrs.wmnet
  • 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4033.ulsfo.wmnet
  • 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4021.ulsfo.wmnet
  • 17:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1042.eqiad.wmnet
  • 17:28 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1041.eqiad.wmnet
  • 17:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6001.drmrs.wmnet
  • 17:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1043.eqiad.wmnet
  • 17:23 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1042.eqiad.wmnet
  • 17:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1146.eqiad.wmnet with OS buster
  • 17:22 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4021.ulsfo.wmnet
  • 17:21 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1044.eqiad.wmnet
  • 17:21 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 17:18 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1043.eqiad.wmnet
  • 17:17 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1045.eqiad.wmnet
  • 17:16 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1044.eqiad.wmnet
  • 17:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:13 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
  • 17:12 mutante: serially rebooting hosts in the wtp104* range
  • 17:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:09 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1045.eqiad.wmnet
  • 17:08 mutante: wtp1046 - rebooting
  • 17:06 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1007.eqiad.wmnet
  • 17:06 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1007.eqiad.wmnet
  • 17:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1007.eqiad.wmnet with OS bullseye
  • 17:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
  • 17:05 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 16:54 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
  • 16:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 16:51 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 16:49 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 16:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:48 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 16:43 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:43 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 16:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:41 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:38 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 16:36 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1007.eqiad.wmnet with OS bullseye
  • 16:35 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:35 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24123 and previous config saved to /var/cache/conftool/dbconfig/20220405-163454-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 16:32 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Upgrade dbstore1007 to bullseye
  • 16:32 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Upgrade dbstore1007 to bullseye
  • 16:32 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1005.eqiad.wmnet
  • 16:32 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1005.eqiad.wmnet
  • 16:19 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1005.eqiad.wmnet with OS bullseye
  • 16:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1090.eqiad.wmnet
  • 16:08 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:08 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:07 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:07 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1005.eqiad.wmnet with reason: host reimage
  • 16:02 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1005.eqiad.wmnet with reason: host reimage
  • 16:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 16:01 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1090.eqiad.wmnet
  • 15:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 15:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 15:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:52 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1005.eqiad.wmnet with OS bullseye
  • 15:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 15:49 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Upgrade dbstore1005 to bullseye
  • 15:49 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Upgrade dbstore1005 to bullseye
  • 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:47 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1003.eqiad.wmnet
  • 15:47 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1003.eqiad.wmnet
  • 15:46 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 15:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2024.codfw.wmnet with reason: Rebooting for T303174
  • 15:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2024.codfw.wmnet with reason: Rebooting for T303174
  • 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 15:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 15:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 15:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 15:42 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.5/includes: Backport: ParserOutputAccess: Allow calling getPO with option of not saving in PC (T285993) (duration: 01m 00s)
  • 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1089.eqiad.wmnet
  • 15:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:40 moritzm: drain ganeti2019 T305469
  • 15:39 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1003.eqiad.wmnet with OS bullseye
  • 15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
  • 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1088.eqiad.wmnet
  • 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 15:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 15:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 15:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4036.ulsfo.wmnet
  • 15:26 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:26 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:25 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1003.eqiad.wmnet with reason: host reimage
  • 15:23 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1088.eqiad.wmnet
  • 15:23 mmandere: pool cp5007 with HAProxy as TLS termination layer - T290005
  • 15:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet with reason: Rebooting for T303174
  • 15:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet with reason: Rebooting for T303174
  • 15:20 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1003.eqiad.wmnet with reason: host reimage
  • 15:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:19 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5007.eqsin.wmnet with OS buster
  • 15:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 15:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:12 moritzm: installing atftp security updates
  • 15:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2022.codfw.wmnet with reason: Rebooting for T303174
  • 15:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2022.codfw.wmnet with reason: Rebooting for T303174
  • 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3065.esams.wmnet
  • 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1087.eqiad.wmnet
  • 15:10 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1003.eqiad.wmnet with OS bullseye
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2020.codfw.wmnet with reason: Rebooting for T303174
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2020.codfw.wmnet with reason: Rebooting for T303174
  • 15:02 mmandere: pool cp5013 with HAProxy as TLS termination layer - T290005
  • 15:01 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Taking host offline to upgrade to Bullseye
  • 15:01 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Taking host offline to upgrade to Bullseye
  • 15:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5013.eqsin.wmnet with OS buster
  • 14:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2013.codfw.wmnet with reason: Rebooting for T303174
  • 14:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2013.codfw.wmnet with reason: Rebooting for T303174
  • 14:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3065.esams.wmnet
  • 14:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-cache1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1087.eqiad.wmnet
  • 14:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-cache1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4036.ulsfo.wmnet
  • 14:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5007.eqsin.wmnet with reason: host reimage
  • 14:49 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-cache1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2012.codfw.wmnet with reason: Rebooting for T303174
  • 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2012.codfw.wmnet with reason: Rebooting for T303174
  • 14:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5007.eqsin.wmnet with reason: host reimage
  • 14:44 vgutierrez: re-pool cp1086
  • 14:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2011.codfw.wmnet with reason: Rebooting for T303174
  • 14:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2011.codfw.wmnet with reason: Rebooting for T303174
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24122 and previous config saved to /var/cache/conftool/dbconfig/20220405-143316-ladsgroup.json
  • 14:31 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
  • 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster1001.eqiad.wmnet with reason: reimage
  • 14:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster1001.eqiad.wmnet with reason: reimage
  • 14:31 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
  • 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2014.codfw.wmnet with reason: Rebooting for T303174
  • 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2014.codfw.wmnet with reason: Rebooting for T303174
  • 14:22 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5007.eqsin.wmnet with OS buster
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24121 and previous config saved to /var/cache/conftool/dbconfig/20220405-141811-ladsgroup.json
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:12 mmandere: depool cp5007 for reimage - T290005
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5013.eqsin.wmnet with OS buster
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable videojs on all of DIP wikis (T248418) (duration: 00m 53s)
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24120 and previous config saved to /var/cache/conftool/dbconfig/20220405-140306-ladsgroup.json
  • 13:58 mmandere: depool cp5013 for reimage - T290005
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3004.wikimedia.org
  • 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24119 and previous config saved to /var/cache/conftool/dbconfig/20220405-134801-ladsgroup.json
  • 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deneb.codfw.wmnet
  • 13:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1086.eqiad.wmnet
  • 13:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3004.wikimedia.org
  • 13:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Cluster re-init for new IP ranges
  • 13:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Cluster re-init for new IP ranges
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deneb.codfw.wmnet
  • 13:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1086.eqiad.wmnet
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster2001.codfw.wmnet with reason: reimage
  • 13:23 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster2001.codfw.wmnet with reason: reimage
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4003.wikimedia.org
  • 13:20 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Start writing to $wmgUdp2logDest the same value as to $wmfUdp2logDest (T45956) (duration: 00m 54s)
  • 13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4003.wikimedia.org
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:17 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Pin CheckUser actor migration to old schema (T233004) (duration: 00m 54s)
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4030.ulsfo.wmnet
  • 13:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4030.ulsfo.wmnet
  • 13:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3064.esams.wmnet
  • 13:03 moritzm: installing openssl updates from buster 10.12 point release
  • 13:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
  • 12:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
  • 12:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:54 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3064.esams.wmnet
  • 12:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1085.eqiad.wmnet
  • 12:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24117 and previous config saved to /var/cache/conftool/dbconfig/20220405-124745-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24116 and previous config saved to /var/cache/conftool/dbconfig/20220405-124732-ladsgroup.json
  • 12:46 mmandere: pool cp6007 with HAProxy as TLS termination layer - T290005
  • 12:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1085.eqiad.wmnet
  • 12:40 mmandere: pool cp5015 with HAProxy as TLS termination layer - T290005
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24115 and previous config saved to /var/cache/conftool/dbconfig/20220405-123227-ladsgroup.json
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:22 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS buster
  • 12:18 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24114 and previous config saved to /var/cache/conftool/dbconfig/20220405-121722-ladsgroup.json
  • 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5015.eqsin.wmnet with OS buster
  • 11:56 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
  • 11:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:52 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.6 refs T305212
  • 11:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
  • 11:48 jnuche@deploy1002: Finished scap: resync wmf.6 to reapply security patches - T305212 (duration: 02m 50s)
  • 11:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
  • 11:45 jnuche@deploy1002: Started scap: resync wmf.6 to reapply security patches - T305212
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 T305427', diff saved to https://phabricator.wikimedia.org/P24112 and previous config saved to /var/cache/conftool/dbconfig/20220405-113944-root.json
  • 11:38 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS buster
  • 11:31 mmandere: depool cp6007 for reimage - T290005
  • 11:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:23 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5015.eqsin.wmnet with OS buster
  • 11:15 mmandere: depool cp5015 for reimage - T290005
  • 11:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:10 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 11:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 11:06 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudgw1001.eqiad.wmnet
  • 11:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24111 and previous config saved to /var/cache/conftool/dbconfig/20220405-110232-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24110 and previous config saved to /var/cache/conftool/dbconfig/20220405-110224-ladsgroup.json
  • 11:03 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:56 volans: installer spicerack v2.4.0 on the cumin hosts
  • 10:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24109 and previous config saved to /var/cache/conftool/dbconfig/20220405-104719-ladsgroup.json
  • 10:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 10:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24108 and previous config saved to /var/cache/conftool/dbconfig/20220405-103214-ladsgroup.json
  • 10:30 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:30 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 10:30 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 10:19 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:18 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24107 and previous config saved to /var/cache/conftool/dbconfig/20220405-101709-ladsgroup.json
  • 09:49 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24105 and previous config saved to /var/cache/conftool/dbconfig/20220405-091947-ladsgroup.json
  • 09:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24104 and previous config saved to /var/cache/conftool/dbconfig/20220405-091939-ladsgroup.json
  • 09:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.6"
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24103 and previous config saved to /var/cache/conftool/dbconfig/20220405-090434-ladsgroup.json
  • 08:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24102 and previous config saved to /var/cache/conftool/dbconfig/20220405-084928-ladsgroup.json
  • 08:49 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:46 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:35 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24101 and previous config saved to /var/cache/conftool/dbconfig/20220405-083423-ladsgroup.json
  • 08:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:31 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.6 refs T305212
  • 08:28 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:26 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dragonfly-supernode2001.codfw.wmnet
  • 08:23 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:21 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.6 refs T305212 (duration: 42m 53s)
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
  • 08:13 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:13 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:52 XioNoX: disable BGP to Tata in drmrs for circuit move - T298208
  • 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:38 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.6 refs T305212
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24100 and previous config saved to /var/cache/conftool/dbconfig/20220405-073617-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24099 and previous config saved to /var/cache/conftool/dbconfig/20220405-073608-ladsgroup.json
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24098 and previous config saved to /var/cache/conftool/dbconfig/20220405-072103-ladsgroup.json
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24097 and previous config saved to /var/cache/conftool/dbconfig/20220405-070558-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24096 and previous config saved to /var/cache/conftool/dbconfig/20220405-065053-ladsgroup.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1132 T301879', diff saved to https://phabricator.wikimedia.org/P24095 and previous config saved to /var/cache/conftool/dbconfig/20220405-063648-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 into API for testing T301879', diff saved to https://phabricator.wikimedia.org/P24094 and previous config saved to /var/cache/conftool/dbconfig/20220405-060124-marostegui.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 for testing T301879', diff saved to https://phabricator.wikimedia.org/P24093 and previous config saved to /var/cache/conftool/dbconfig/20220405-055256-marostegui.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24092 and previous config saved to /var/cache/conftool/dbconfig/20220405-054610-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24091 and previous config saved to /var/cache/conftool/dbconfig/20220405-054602-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24090 and previous config saved to /var/cache/conftool/dbconfig/20220405-053057-ladsgroup.json
  • 05:17 _joe_: uploading new minor version of conftool to apt for buster/bullseye (requestctl new feature)
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24089 and previous config saved to /var/cache/conftool/dbconfig/20220405-051552-ladsgroup.json
  • 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24088 and previous config saved to /var/cache/conftool/dbconfig/20220405-050047-ladsgroup.json
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 for testing T301879', diff saved to https://phabricator.wikimedia.org/P24087 and previous config saved to /var/cache/conftool/dbconfig/20220405-043426-marostegui.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24086 and previous config saved to /var/cache/conftool/dbconfig/20220405-040309-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 04:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24085 and previous config saved to /var/cache/conftool/dbconfig/20220405-040301-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24084 and previous config saved to /var/cache/conftool/dbconfig/20220405-034756-ladsgroup.json
  • 03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24083 and previous config saved to /var/cache/conftool/dbconfig/20220405-033251-ladsgroup.json
  • 03:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24082 and previous config saved to /var/cache/conftool/dbconfig/20220405-031745-ladsgroup.json
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24081 and previous config saved to /var/cache/conftool/dbconfig/20220405-022132-ladsgroup.json
  • 02:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24080 and previous config saved to /var/cache/conftool/dbconfig/20220405-022124-ladsgroup.json
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24079 and previous config saved to /var/cache/conftool/dbconfig/20220405-020619-ladsgroup.json
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: T305423
  • 01:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: T305423
  • 01:57 eileen: process control config revision changed from 06379640 to 25728a0e
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24078 and previous config saved to /var/cache/conftool/dbconfig/20220405-015114-ladsgroup.json
  • 01:47 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp5002.eqsin.wmnet
  • 01:42 eileen: civicrm revision changed from 84c737b6 to 87bc3114
  • 01:37 eileen: config revision changed from bb0e1af3 to 06379640
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24077 and previous config saved to /var/cache/conftool/dbconfig/20220405-013609-ladsgroup.json
  • 01:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet
  • 01:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet
  • 01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet
  • 01:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3063.esams.wmnet
  • 00:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4034.ulsfo.wmnet
  • 00:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5016.eqsin.wmnet
  • 00:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3063.esams.wmnet
  • 00:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1084.eqiad.wmnet
  • 00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4034.ulsfo.wmnet
  • 00:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2042.codfw.wmnet
  • 00:43 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5016.eqsin.wmnet
  • 00:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1084.eqiad.wmnet
  • 00:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2042.codfw.wmnet
  • 00:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4032.ulsfo.wmnet
  • 00:39 mutante: gitlab1001 - mv 1648814678_2022_04_01_14.9.1_gitlab_backup.tar and other files from April 2nd/April 3rd over from /srv/gitlab-backup to /mnt/gitlab-backup to prevent another outage due to disk space T274463
  • 00:36 mutante: gitlab2001 - apt-get clean to prevent disk space issues
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24076 and previous config saved to /var/cache/conftool/dbconfig/20220405-003419-ladsgroup.json
  • 00:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24075 and previous config saved to /var/cache/conftool/dbconfig/20220405-003405-ladsgroup.json
  • 00:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4032.ulsfo.wmnet
  • 00:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
  • 00:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1047.eqiad.wmnet
  • 00:32 mutante: gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover... T274463 - <+icinga-wm> RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK
  • 00:30 mutante: gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover...
  • 00:27 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1048.eqiad.wmnet
  • 00:23 mutante: wtp1046, wtp1047, wtp1048 - rebooting, one at a time
  • 00:21 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp104[6-8].eqiad.wmnet
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24074 and previous config saved to /var/cache/conftool/dbconfig/20220405-001900-ladsgroup.json
  • 00:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5012.eqsin.wmnet
  • 00:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3062.esams.wmnet
  • 00:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1083.eqiad.wmnet
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24073 and previous config saved to /var/cache/conftool/dbconfig/20220405-000355-ladsgroup.json

2022-04-04

  • 23:51 mutante: apt1001 - importing gitlab-runner package for bullseye via: 'sudo -E reprepro --noskipold --component thirdparty/gitlab-runner update bullseye-wikimedia' after gerrit:767604 (T297659)
  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24072 and previous config saved to /var/cache/conftool/dbconfig/20220404-234850-ladsgroup.json
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24071 and previous config saved to /var/cache/conftool/dbconfig/20220404-224836-ladsgroup.json
  • 22:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 22:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24070 and previous config saved to /var/cache/conftool/dbconfig/20220404-224828-ladsgroup.json
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24069 and previous config saved to /var/cache/conftool/dbconfig/20220404-223323-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24068 and previous config saved to /var/cache/conftool/dbconfig/20220404-221818-ladsgroup.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24067 and previous config saved to /var/cache/conftool/dbconfig/20220404-220313-ladsgroup.json
  • 21:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet
  • 21:14 mutante: puppetmaster1001/puppetmaster2003 - geoip / maxmind database update timers renamed. 'geoip_update_legacy' became 'geoip_update_main', 'geoip_update' became 'geoip_update_ipinfo'. Not using the confusing 'legacy' term anymore as was suggested as part of (T303464)
  • 21:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5011.eqsin.wmnet
  • 21:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2041.codfw.wmnet
  • 21:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet
  • 21:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5011.eqsin.wmnet
  • 21:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2041.codfw.wmnet
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24066 and previous config saved to /var/cache/conftool/dbconfig/20220404-205932-ladsgroup.json
  • 20:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 20:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24065 and previous config saved to /var/cache/conftool/dbconfig/20220404-205924-ladsgroup.json
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24064 and previous config saved to /var/cache/conftool/dbconfig/20220404-204419-ladsgroup.json
  • 20:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1081.eqiad.wmnet
  • 20:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5010.eqsin.wmnet
  • 20:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3061.esams.wmnet
  • 20:32 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1081.eqiad.wmnet
  • 20:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5010.eqsin.wmnet
  • 20:30 urbanecm: UTC late B&C window completed
  • 20:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3061.esams.wmnet
  • 20:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8c81de9: Remove wgWMEIPAddressCopyActionEnabled from Beta and production config (T296469) (duration: 00m 51s)
  • 20:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24063 and previous config saved to /var/cache/conftool/dbconfig/20220404-202914-ladsgroup.json
  • 20:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5006.eqsin.wmnet
  • 20:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1080.eqiad.wmnet
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:16 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5006.eqsin.wmnet
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4027.ulsfo.wmnet
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24062 and previous config saved to /var/cache/conftool/dbconfig/20220404-201409-ladsgroup.json
  • 20:11 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1080.eqiad.wmnet
  • 20:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3060.esams.wmnet
  • 20:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4027.ulsfo.wmnet
  • 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
  • 20:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp3060.esams.wmnet
  • 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
  • 19:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5005.eqsin.wmnet
  • 19:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
  • 19:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2040.codfw.wmnet
  • 19:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
  • 19:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2040.codfw.wmnet
  • 19:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1079.eqiad.wmnet
  • 19:38 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
  • 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1002.eqiad.wmnet
  • 19:35 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1002.eqiad.wmnet
  • 19:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2002.codfw.wmnet
  • 19:33 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon2002.codfw.wmnet
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1079.eqiad.wmnet
  • 19:22 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1001.eqiad.wmnet
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24061 and previous config saved to /var/cache/conftool/dbconfig/20220404-191750-ladsgroup.json
  • 19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24060 and previous config saved to /var/cache/conftool/dbconfig/20220404-191743-ladsgroup.json
  • 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-tls
  • 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-be
  • 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=varnish-fe
  • 19:16 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1001.eqiad.wmnet
  • 19:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4026.ulsfo.wmnet
  • 19:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3059.esams.wmnet
  • 19:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp5005.eqsin.wmnet
  • 19:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4026.ulsfo.wmnet
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24059 and previous config saved to /var/cache/conftool/dbconfig/20220404-190238-ladsgroup.json
  • 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3059.esams.wmnet
  • 18:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2039.codfw.wmnet
  • 18:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1078.eqiad.wmnet
  • 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
  • 18:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2039.codfw.wmnet
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24058 and previous config saved to /var/cache/conftool/dbconfig/20220404-184733-ladsgroup.json
  • 18:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3058.esams.wmnet
  • 18:46 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1078.eqiad.wmnet
  • 18:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4025.ulsfo.wmnet
  • 18:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5004.eqsin.wmnet
  • 18:39 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4025.ulsfo.wmnet
  • 18:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage2001.codfw.wmnet
  • 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3058.esams.wmnet
  • 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5004.eqsin.wmnet
  • 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1077.eqiad.wmnet
  • 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2038.codfw.wmnet
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24057 and previous config saved to /var/cache/conftool/dbconfig/20220404-183227-ladsgroup.json
  • 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4024.ulsfo.wmnet
  • 18:26 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2038.codfw.wmnet
  • 18:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage2001.codfw.wmnet
  • 18:25 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1077.eqiad.wmnet
  • 18:25 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4024.ulsfo.wmnet
  • 18:25 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage1001.eqiad.wmnet
  • 18:08 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
  • 17:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5001.eqsin.wmnet
  • 17:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 17:27 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24056 and previous config saved to /var/cache/conftool/dbconfig/20220404-172707-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24055 and previous config saved to /var/cache/conftool/dbconfig/20220404-172659-ladsgroup.json
  • 17:25 XioNoX: push urpf DHCP exception to all core routers with urpf configured - T285461
  • 17:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5001.eqsin.wmnet
  • 17:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2037.codfw.wmnet
  • 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2037.codfw.wmnet
  • 17:16 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1076.eqiad.wmnet
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24054 and previous config saved to /var/cache/conftool/dbconfig/20220404-171154-ladsgroup.json
  • 17:11 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:10 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1076.eqiad.wmnet
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24053 and previous config saved to /var/cache/conftool/dbconfig/20220404-165649-ladsgroup.json
  • 16:50 taavi: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Brand" "Brand/Archive" "Majavah" --reason 'phab:T305387' # T305387
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24052 and previous config saved to /var/cache/conftool/dbconfig/20220404-164144-ladsgroup.json
  • 16:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 16:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 16:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 16:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 16:09 volans: uploaded spicerack_2.4.0 to apt.wikimedia.org bullseye-wikimedia
  • 16:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:08 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 16:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:02 bblack@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
  • 16:00 bblack@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
  • 15:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 15:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 15:44 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24051 and previous config saved to /var/cache/conftool/dbconfig/20220404-153846-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24050 and previous config saved to /var/cache/conftool/dbconfig/20220404-153839-ladsgroup.json
  • 15:28 moritzm: remove stray debmonitor-server/cumin installs (cleanup of 548425b)
  • 15:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases1002.eqiad.wmnet
  • 15:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24049 and previous config saved to /var/cache/conftool/dbconfig/20220404-152333-ladsgroup.json
  • 15:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host releases1002.eqiad.wmnet
  • 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Use "unexpectedUnconnectedPage" page prop on Beta (production no-op) (duration: 00m 50s)
  • 15:17 mmandere: pool cp6015 with HAProxy as TLS termination layer - T290005
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24048 and previous config saved to /var/cache/conftool/dbconfig/20220404-150828-ladsgroup.json
  • 15:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
  • 15:05 mmandere: pool cp5008 with HAProxy as TLS termination layer - T290005
  • 15:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5008.eqsin.wmnet with OS buster
  • 14:55 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host alert1001.wikimedia.org
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24047 and previous config saved to /var/cache/conftool/dbconfig/20220404-145323-ladsgroup.json
  • 14:44 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 14:44 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org
  • 14:42 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 14:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 14:37 herron: rebooting alert2001
  • 14:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
  • 14:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases2002.codfw.wmnet
  • 14:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host releases2002.codfw.wmnet
  • 14:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
  • 14:16 mmandere: depool cp6015 for reimage - T290005
  • 14:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5008.eqsin.wmnet with OS buster
  • 14:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 13:58 mmandere: depool cp5008 for reimage - T290005
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24045 and previous config saved to /var/cache/conftool/dbconfig/20220404-135314-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24044 and previous config saved to /var/cache/conftool/dbconfig/20220404-135307-ladsgroup.json
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
  • 13:44 mmandere: pool cp3055 with HAProxy as TLS termination layer - T290005
  • 13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24043 and previous config saved to /var/cache/conftool/dbconfig/20220404-133801-ladsgroup.json
  • 13:35 mmandere: pool cp4022 with HAProxy as TLS termination layer - T290005
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5001.wikimedia.org
  • 13:34 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3055.esams.wmnet with OS buster
  • 13:31 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4022.ulsfo.wmnet with OS buster
  • 13:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5001.wikimedia.org
  • 13:26 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24042 and previous config saved to /var/cache/conftool/dbconfig/20220404-132256-ladsgroup.json
  • 13:20 urbanecm: UTC afternoon B&C window done
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 daniel@deploy1002: Synchronized multiversion/defines.php: Config: Always set MW_USE_CONFIG_SCHEMA. (T305176) (duration: 00m 50s)
  • 13:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24041 and previous config saved to /var/cache/conftool/dbconfig/20220404-130751-ladsgroup.json
  • 13:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:05 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 13:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7ebad8f: Add logo variants for zhwiki (T273578) (duration: 00m 51s)
  • 13:04 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
  • 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
  • 12:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
  • 12:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 12:48 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4022.ulsfo.wmnet with OS buster
  • 12:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 12:43 moritzm: installing gmp security updates
  • 12:42 mmandere: depool cp4022 for reimage - T290005
  • 12:38 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS buster
  • 12:35 ottomata: removing retention.ms override from eventstreams publicly exposed topics in kafka main-eqiad and main-codfw - T241178
  • 12:31 mmandere: depool cp3055 for reimage - T290005
  • 12:31 ottomata: deleting empty typo topics from kafka main-eqiad: eqiad.mediawiki.page-edit (found while working on T241178)
  • 12:26 ottomata: deleting empty typo topics from kafka main-codfw: codfw.mediawiki.page_delete, codfw.mediawiki.page_move, codfw.mediawiki.page_restore, codfw.mediawiki.revision_create, codfw.mediawiki.revision_visibility_set, codfw.mediawiki.user_block (found while working on T241178)
  • 12:18 moritzm: installing expat updates (followups to earlier security fixes, no security impact by itself)
  • 12:11 mmandere: pool cp4028 with HAProxy as TLS termination layer - T290005
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24040 and previous config saved to /var/cache/conftool/dbconfig/20220404-121030-ladsgroup.json
  • 12:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24039 and previous config saved to /var/cache/conftool/dbconfig/20220404-121022-ladsgroup.json
  • 12:05 mmandere: pool cp3054 with HAProxy as TLS termination layer - T290005
  • 12:04 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4028.ulsfo.wmnet with OS buster
  • 12:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3054.esams.wmnet with OS buster
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24038 and previous config saved to /var/cache/conftool/dbconfig/20220404-115516-ladsgroup.json
  • 11:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24037 and previous config saved to /var/cache/conftool/dbconfig/20220404-114011-ladsgroup.json
  • 11:39 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:37 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
  • 11:37 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
  • 11:34 moritzm: installing zziplib security updates
  • 11:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
  • 11:27 moritzm: installing jbig2dec security updates
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24036 and previous config saved to /var/cache/conftool/dbconfig/20220404-112506-ladsgroup.json
  • 11:20 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4028.ulsfo.wmnet with OS buster
  • 11:18 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:12 mmandere: depool cp4028 for reimage - T290005
  • 11:11 volans: deploying python3-wmflib 1.2.0 fleet-wide
  • 11:09 jforrester@deploy1002: Finished deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block (duration: 00m 08s)
  • 11:09 jforrester@deploy1002: Started deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block
  • 11:07 moritzm: installing cups security updates on buster (client side tools/libs)
  • 11:04 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS buster
  • 10:53 mmandere: depool cp3054 for reimage - T290005
  • 10:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1003.eqiad.wmnet
  • 10:38 volans: uploaded python3-wmflib_1.2.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 10:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1003.eqiad.wmnet
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24035 and previous config saved to /var/cache/conftool/dbconfig/20220404-102616-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24034 and previous config saved to /var/cache/conftool/dbconfig/20220404-102609-ladsgroup.json
  • 10:26 moritzm: installing libxml2 security updates
  • 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1004.eqiad.wmnet
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24033 and previous config saved to /var/cache/conftool/dbconfig/20220404-101104-ladsgroup.json
  • 10:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1004.eqiad.wmnet
  • 10:08 moritzm: installing icu bugfix updates from buster 10.12 point release
  • 09:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1005.eqiad.wmnet
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24032 and previous config saved to /var/cache/conftool/dbconfig/20220404-095558-ladsgroup.json
  • 09:55 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
  • 09:54 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1005.eqiad.wmnet
  • 09:51 mmandere: pool cp6008 with HAProxy as TLS termination layer - T290005
  • 09:48 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
  • 09:47 moritzm: installing zlib security updates
  • 09:44 mmandere: pool cp5003 with HAProxy as TLS termination layer - T290005
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24031 and previous config saved to /var/cache/conftool/dbconfig/20220404-094053-ladsgroup.json
  • 09:31 moritzm: rolling restart of FPM/Apache on mw canaries to pick up updated zlib/glibc/openssl/libxml
  • 09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 09:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 09:26 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS buster
  • 09:26 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 09:25 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5003.eqsin.wmnet with OS buster
  • 09:16 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 09:12 moritzm: installing openssl updates from Buster 10.12 point release
  • 09:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 08:59 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 08:59 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
  • 08:56 moritzm: installing glibc updates from buster 10.12 point release
  • 08:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P24030 and previous config saved to /var/cache/conftool/dbconfig/20220404-084523-root.json
  • 08:43 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:42 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
  • 08:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:37 moritzm: installing flac security updates
  • 08:37 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:37 mmandere: depool cp6008 for reimage - T290005
  • 08:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:31 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:31 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24029 and previous config saved to /var/cache/conftool/dbconfig/20220404-083031-ladsgroup.json
  • 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:28 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5003.eqsin.wmnet with OS buster
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:25 urbanecm@deploy1002: Synchronized logos/config.yaml: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (3/3) (duration: 00m 50s)
  • 08:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (2/3) (duration: 00m 50s)
  • 08:23 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (1/3) (duration: 00m 51s)
  • 08:19 mmandere: depool cp5003 for reimage - T290005
  • 08:02 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 14s)
  • 08:01 jayme@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 07:54 jayme: imported scap 4.6.0 to stretch-/buster-/bullseye-wikimedia - T305250
  • 07:44 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:43 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 07:43 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 07:43 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 07:42 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:39 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:23 taavi: UTC morning deployments done
  • 07:21 taavi@deploy1002: Synchronized wmf-config/throttle.php: Config: throttle: removed expired rule (T304836) (duration: 00m 49s)
  • 07:19 taavi@deploy1002: Synchronized static/images/mobile/copyright/: Config: Revert "fawiki: Set celebration logo for new vector" (T304314) (duration: 00m 49s)
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "fawiki: Set celebration logo for new vector" (T304314) (duration: 00m 50s)
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:15 taavi@deploy1002: Synchronized static/images/project-logos: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 50s)
  • 07:14 taavi@deploy1002: Synchronized logos/config.yaml: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 50s)
  • 07:13 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 51s)
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content and Section Translation for Persian Wikipedia (T296475) (duration: 00m 51s)
  • 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24027 and previous config saved to /var/cache/conftool/dbconfig/20220404-060542-ladsgroup.json
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24026 and previous config saved to /var/cache/conftool/dbconfig/20220404-055037-ladsgroup.json
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1130.eqiad.wmnet with OS bullseye
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24025 and previous config saved to /var/cache/conftool/dbconfig/20220404-053531-ladsgroup.json
  • 05:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
  • 05:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24024 and previous config saved to /var/cache/conftool/dbconfig/20220404-052026-ladsgroup.json
  • 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1130.eqiad.wmnet with OS bullseye
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24023 and previous config saved to /var/cache/conftool/dbconfig/20220404-041545-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 03:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 03:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 02:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 02:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance

2022-04-02

  • 11:26 akosiaris: disable zotero paging until T291707 is resolved.
  • 11:11 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 11:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync

2022-04-01

  • 23:25 mutante: DNS - new project language 'kcg'. 'Tyap is a regionally important dialect cluster of Plateau languages in Nigeria's Middle Belt, named after its prestige dialect. It is also known by its Hausa exonym as Katab or Kataf.' T305279
  • 23:08 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 23:08 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 22:04 bblack: esams re-pooled - T304089
  • 20:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:19 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:48 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
  • 19:47 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
  • 19:44 mutante: rebooting parsoid canary appservers - wtp1025, wtp1026, parse2001, parse2002
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].eqiad.wmnet
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=parse200[1-2].eqiad.wmnet
  • 19:37 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
  • 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
  • 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1450.eqiad.wmnet
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=varnish-fe
  • 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-tls
  • 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-be
  • 19:16 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
  • 19:16 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
  • 19:01 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
  • 19:00 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp2036.codfw.wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1414.wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw1414.wmnet
  • 18:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw141[4-8].wmnet
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2036.codfw.wmnet
  • 13:05 dcausse: reseting jvmquake flag on all wdqs hosts
  • 12:52 dcausse: restarting blazegraph on wdqs1006 and resetting jvmquake warning flag
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 11:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
  • 10:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 10:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 10:47 vgutierrez: reboot acme-chief instances to catch up on kernel upgrades
  • 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6002.drmrs.wmnet
  • 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6002.drmrs.wmnet
  • 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6001.drmrs.wmnet
  • 10:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6001.drmrs.wmnet
  • 10:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5002.eqsin.wmnet
  • 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5002.eqsin.wmnet
  • 10:06 vgutierrez: vgutierrez@puppetmaster2001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
  • 10:04 vgutierrez: vgutierrez@puppetmaster1001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
  • 10:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5001.eqsin.wmnet
  • 09:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5001.eqsin.wmnet
  • 09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4002.ulsfo.wmnet
  • 09:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4002.ulsfo.wmnet
  • 09:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4001.ulsfo.wmnet
  • 09:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4001.ulsfo.wmnet
  • 09:35 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir3002.esams.wmnet
  • 09:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3002.esams.wmnet
  • 09:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3001.esams.wmnet
  • 09:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3001.esams.wmnet
  • 09:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet
  • 09:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet
  • 09:10 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir2001.codfw.wmnet
  • 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2001.codfw.wmnet
  • 08:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1002.eqiad.wmnet
  • 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1002.eqiad.wmnet
  • 08:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1001.eqiad.wmnet
  • 08:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
  • 08:48 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ncredir1001.eqiad.wmnet
  • 08:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
  • 08:44 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 08:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:42 vgutierrez: rolling restart of ncredir instances to catch up on kernel upgrades
  • 06:54 XioNoX: traffic engineering in drmrs to prevent link saturation

Archives

See Server Admin Log/Archives.