You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage)
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298555)', diff saved to https://phabricator.wikimedia.org/P28208 and previous config saved to /var/cache/conftool/dbconfig/20220521-010640-ladsgroup.json)
(64 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2022-03-16 ==
== 2022-05-21 ==
* 00:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28208 and previous config saved to /var/cache/conftool/dbconfig/20220521-010640-ladsgroup.json
* 00:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
* 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
* 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 00:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28207 and previous config saved to /var/cache/conftool/dbconfig/20220521-010626-ladsgroup.json
* 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28206 and previous config saved to /var/cache/conftool/dbconfig/20220521-001014-ladsgroup.json
* 00:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 00:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance


== 2022-03-15 ==
== 2022-05-20 ==
* 22:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
* 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28205 and previous config saved to /var/cache/conftool/dbconfig/20220520-224558-ladsgroup.json
* 22:07 andrew@cumin1001: START - Cookbook sre.
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28204 and previous config saved to /var/cache/conftool/dbconfig/20220520-223054-ladsgroup.json
* 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28203 and previous config saved to /var/cache/conftool/dbconfig/20220520-221550-ladsgroup.json
* 22:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye
* 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28202 and previous config saved to /var/cache/conftool/dbconfig/20220520-220046-ladsgroup.json
* 21:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 21:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28201 and previous config saved to /var/cache/conftool/dbconfig/20220520-215514-ladsgroup.json
* 21:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
* 21:50 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
* 21:38 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye
* 21:37 mutante: correction: mistake was to use FQDN [[phab:T307142|T307142]]
* 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError [[phab:T307142|T307142]]
* 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError
* 21:34 mutante: reimaging gitlab1004 (insetup) to test partman recipe from gerrit:793534 - [[phab:T307142|T307142]]
* 21:34 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
* 21:33 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
* 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28198 and previous config saved to /var/cache/conftool/dbconfig/20220520-190633-ladsgroup.json
* 19:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 19:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 18:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:55 mutante: [mwmaint1002:~] $ sudo mwscript initSiteStats.php --wiki=kcgwiki --update  (to update statistics for latest wikipedia kcg) [[phab:T305281|T305281]]
* 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 17:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 17:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5003.eqsin.wmnet with OS bullseye
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
* 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22619 and previous config saved to /var/cache/conftool/dbconfig/20220315-200357-marostegui.json
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22618 and previous config saved to /var/cache/conftool/dbconfig/20220315-195934-ladsgroup.json
* 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 19:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22617 and previous config saved to /var/cache/conftool/dbconfig/20220315-195657-ladsgroup.json
* 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
* 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
* 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
* 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22616 and previous config saved to /var/cache/conftool/dbconfig/20220315-194152-ladsgroup.json
* 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22615 and previous config saved to /var/cache/conftool/dbconfig/20220315-193029-marostegui.json
* 19:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 19:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22614 and previous config saved to /var/cache/conftool/dbconfig/20220315-192647-ladsgroup.json
* 19:24 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 19:22 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 19:19 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 19:18 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22612 and previous config saved to /var/cache/conftool/dbconfig/20220315-191234-marostegui.json
* 19:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22611 and previous config saved to /var/cache/conftool/dbconfig/20220315-191226-marostegui.json
* 19:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS buster
* 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22610 and previous config saved to /var/cache/conftool/dbconfig/20220315-191140-ladsgroup.json
* 19:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
* 19:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
* 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
* 18:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22609 and previous config saved to /var/cache/conftool/dbconfig/20220315-185721-marostegui.json
* 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
* 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
* 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
* 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22608 and previous config saved to /var/cache/conftool/dbconfig/20220315-185413-ladsgroup.json
* 18:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22607 and previous config saved to /var/cache/conftool/dbconfig/20220315-185405-ladsgroup.json
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS stretch
* 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1070.eqiad.wmnet with OS stretch
* 18:42 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/api/ApiQueryBacklinksprop.php: Backport: [[gerrit:792140{{!}}ApiQueryBacklinksprop: Make sure the index setting exists (T306673)]] (duration: 00m 50s)
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1071.eqiad.wmnet with OS buster
* 18:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22606 and previous config saved to /var/cache/conftool/dbconfig/20220315-184216-marostegui.json
* 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22605 and previous config saved to /var/cache/conftool/dbconfig/20220315-183900-ladsgroup.json
* 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:25 mutante: ACKIng again all unhandled CRIT alerts on hosts with "dev" in their name - (imho dev hosts should not have prod CRIT alerts?)
* 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
* 15:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox-dev2001.wikimedia.org
* 18:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1071.eqiad.wmnet with OS buster
* 15:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS buster
* 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1070.eqiad.wmnet with OS buster
* 15:50 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22604 and previous config saved to /var/cache/conftool/dbconfig/20220315-182711-marostegui.json
* 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
* 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22603 and previous config saved to /var/cache/conftool/dbconfig/20220315-182355-ladsgroup.json
* 15:47 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox-dev2001.wikimedia.org
* 18:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
* 15:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:792229{{!}} Bumping portals to master (T128546)]] (duration: 00m 51s)
* 18:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
* 15:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:792229{{!}} Bumping portals to master (T128546)]] (duration: 00m 50s)
* 18:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
* 15:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netbox2001-dev.wikimedia.org
* 18:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
* 15:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS buster
* 15:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:39 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox2001-dev.wikimedia.org
* 18:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update homer wmf-netbox plugin - ayounsi@cumin1001
* 18:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1071.eqiad.wmnet with OS buster
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1070.eqiad.wmnet with OS buster
* 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22602 and previous config saved to /var/cache/conftool/dbconfig/20220315-180850-ladsgroup.json
* 15:22 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update homer wmf-netbox plugin - ayounsi@cumin1001
* 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS buster
* 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:18 papaul: rebooting pfw3[a-b]-eqiad for Junos upgrade
* 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:50 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/api/ApiQueryBacklinksprop.php: Backport: Revert: [[gerrit:792136{{!}}ApiQueryBacklinksprop: Force the correct templatelinks index on read new (T306673)]] (duration: 00m 50s)
* 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 14:47 ladsgroup@deploy1002: scap failed: average error rate on 3/8 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
* 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22601 and previous config saved to /var/cache/conftool/dbconfig/20220315-180542-marostegui.json
* 14:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 14:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:57 XioNoX: power down mr1-ulsfo for replacement
* 14:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:52 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22600 and previous config saved to /var/cache/conftool/dbconfig/20220315-175143-marostegui.json
* 14:42 XioNoX: fix MTUs on asw-c-codfw
* 17:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:14 godog: bump disk space in prometheus codfw k8s-ml-serve  (+30G)
* 17:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 14:14 Lucas_WMDE: UTC afternoon backport+config window done (just for the record; actual last backport was half an hour ago)
* 17:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 13:54 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 17:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 13:52 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22599 and previous config saved to /var/cache/conftool/dbconfig/20220315-175130-marostegui.json
* 13:50 XioNoX: fix MTUs on asw-b-codfw
* 17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22598 and previous config saved to /var/cache/conftool/dbconfig/20220315-175037-marostegui.json
* 13:47 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22597 and previous config saved to /var/cache/conftool/dbconfig/20220315-173625-marostegui.json
* 13:46 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22596 and previous config saved to /var/cache/conftool/dbconfig/20220315-173532-marostegui.json
* 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22595 and previous config saved to /var/cache/conftool/dbconfig/20220315-172616-ladsgroup.json
* 17:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 17:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22594 and previous config saved to /var/cache/conftool/dbconfig/20220315-172608-ladsgroup.json
* 17:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:25 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22593 and previous config saved to /var/cache/conftool/dbconfig/20220315-172119-marostegui.json
* 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22592 and previous config saved to /var/cache/conftool/dbconfig/20220315-172027-marostegui.json
* 17:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22591 and previous config saved to /var/cache/conftool/dbconfig/20220315-171103-ladsgroup.json
* 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22590 and previous config saved to /var/cache/conftool/dbconfig/20220315-170614-marostegui.json
* 17:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22589 and previous config saved to /var/cache/conftool/dbconfig/20220315-170201-marostegui.json
* 17:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 17:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 17:01 jhuneidi@deploy1002: Pruned MediaWiki: 1.38.0-wmf.24 (duration: 01m 32s)
* 16:59 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]] (duration: 38m 54s)
* 16:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22588 and previous config saved to /var/cache/conftool/dbconfig/20220315-165558-ladsgroup.json
* 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22587 and previous config saved to /var/cache/conftool/dbconfig/20220315-164751-marostegui.json
* 16:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 16:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22586 and previous config saved to /var/cache/conftool/dbconfig/20220315-164743-marostegui.json
* 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22585 and previous config saved to /var/cache/conftool/dbconfig/20220315-164053-ladsgroup.json
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22584 and previous config saved to /var/cache/conftool/dbconfig/20220315-163626-ladsgroup.json
* 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22583 and previous config saved to /var/cache/conftool/dbconfig/20220315-163618-ladsgroup.json
* 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22582 and previous config saved to /var/cache/conftool/dbconfig/20220315-163238-marostegui.json
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22581 and previous config saved to /var/cache/conftool/dbconfig/20220315-163134-marostegui.json
* 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22580 and previous config saved to /var/cache/conftool/dbconfig/20220315-163126-marostegui.json
* 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22579 and previous config saved to /var/cache/conftool/dbconfig/20220315-162113-ladsgroup.json
* 16:20 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.26  refs [[phab:T300202|T300202]]
* 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22578 and previous config saved to /var/cache/conftool/dbconfig/20220315-161732-marostegui.json
* 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22577 and previous config saved to /var/cache/conftool/dbconfig/20220315-161621-marostegui.json
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22576 and previous config saved to /var/cache/conftool/dbconfig/20220315-160607-ladsgroup.json
* 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22575 and previous config saved to /var/cache/conftool/dbconfig/20220315-160226-marostegui.json
* 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22574 and previous config saved to /var/cache/conftool/dbconfig/20220315-160116-marostegui.json
* 15:53 moritzm: updating Exim on mx1001 [[phab:T303738|T303738]]
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22573 and previous config saved to /var/cache/conftool/dbconfig/20220315-155102-ladsgroup.json
* 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22572 and previous config saved to /var/cache/conftool/dbconfig/20220315-154639-ladsgroup.json
* 15:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22571 and previous config saved to /var/cache/conftool/dbconfig/20220315-154631-ladsgroup.json
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22570 and previous config saved to /var/cache/conftool/dbconfig/20220315-154610-marostegui.json
* 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22569 and previous config saved to /var/cache/conftool/dbconfig/20220315-153126-ladsgroup.json
* 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22568 and previous config saved to /var/cache/conftool/dbconfig/20220315-152916-marostegui.json
* 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 15:18 moritzm: installing Java updates on wcqs*/wdqs* hosts
* 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22567 and previous config saved to /var/cache/conftool/dbconfig/20220315-151621-ladsgroup.json
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22566 and previous config saved to /var/cache/conftool/dbconfig/20220315-151206-marostegui.json
* 15:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 15:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 15:09 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@f01214c]: (no justification provided) (duration: 00m 07s)
* 15:09 ebysans@deploy1002: Started deploy [airflow-dags/analytics@f01214c]: (no justification provided)
* 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22565 and previous config saved to /var/cache/conftool/dbconfig/20220315-150649-root.json
* 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22564 and previous config saved to /var/cache/conftool/dbconfig/20220315-150116-ladsgroup.json
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22563 and previous config saved to /var/cache/conftool/dbconfig/20220315-145246-ladsgroup.json
* 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22562 and previous config saved to /var/cache/conftool/dbconfig/20220315-145238-ladsgroup.json
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22561 and previous config saved to /var/cache/conftool/dbconfig/20220315-145146-root.json
* 14:50 moritzm: installing postgresql-11 security updates
* 14:49 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@88d5618]: (no justification provided) (duration: 00m 07s)
* 14:49 ntsako@deploy1002: Started deploy [airflow-dags/analytics@88d5618]: (no justification provided)
* 14:43 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 14:42 ottomata: I read the cumin output wrong, kafka-jumbo1001 and 1002 restarted successfully before accidental ctrl-c on cumin command.  Restarting the full jumbo roll-restart to thoroughly do them all - [[phab:T303324|T303324]]
* 14:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
* 14:39 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:38 ottomata: all brokers except kafka-jumbo1001 were succesffully roll restarted, doing kafka-jumbo1001 manually - [[phab:T303324|T303324]]
* 14:37 ottomata: accidental cancel of roll restart brokers, re-doing - [[phab:T303324|T303324]]
* 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22560 and previous config saved to /var/cache/conftool/dbconfig/20220315-143733-ladsgroup.json
* 14:37 otto@cumin1001: END (ERROR) - Cookbook sre.kafka.roll-restart-brokers (exit_code=97) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22559 and previous config saved to /var/cache/conftool/dbconfig/20220315-143642-root.json
* 14:32 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@2924232]: (no justification provided) (duration: 00m 08s)
* 14:32 ntsako@deploy1002: Started deploy [airflow-dags/analytics@2924232]: (no justification provided)
* 14:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1068.eqiad.wmnet with OS stretch
* 14:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 14:22 inflatador: [[phab:T303256|T303256]] bking@cumin1001 restarting wdqs services `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-blazegraph`
* 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22558 and previous config saved to /var/cache/conftool/dbconfig/20220315-142228-ladsgroup.json
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22557 and previous config saved to /var/cache/conftool/dbconfig/20220315-142138-root.json
* 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
* 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
* 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22556 and previous config saved to /var/cache/conftool/dbconfig/20220315-140723-ladsgroup.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22555 and previous config saved to /var/cache/conftool/dbconfig/20220315-140634-root.json
* 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22554 and previous config saved to /var/cache/conftool/dbconfig/20220315-140520-marostegui.json
* 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22553 and previous config saved to /var/cache/conftool/dbconfig/20220315-140259-ladsgroup.json
* 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22552 and previous config saved to /var/cache/conftool/dbconfig/20220315-140252-ladsgroup.json
* 14:01 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:00 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 13:59 ottomata: roll restarting kafka jumbo brokers to set max.incremental.fetch.session.cache.slots=2000 - [[phab:T303324|T303324]]
* 13:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
* 13:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22551 and previous config saved to /var/cache/conftool/dbconfig/20220315-135015-marostegui.json
* 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22550 and previous config saved to /var/cache/conftool/dbconfig/20220315-134747-ladsgroup.json
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 13:41 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 awight: EU deployment complete
* 13:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:38 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791724{{!}}thwikibooks: set wgRestrictDisplayTitle to false (T308375)]] (duration: 00m 50s)
* 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript updateArticleCount.php thwikibooks --update # [[phab:T308376|T308376]] [basically instantaneous, 1558 articles]
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22549 and previous config saved to /var/cache/conftool/dbconfig/20220315-133510-marostegui.json
* 13:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791722{{!}}thwikibooks: Add NS 104 and 106 to wgContentNamespaces (T308376)]] (duration: 00m 53s)
* 13:34 awight@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:770907{{!}}[beta] Disable improved template search (T286991, T302857)]] (take 2) (duration: 00m 50s)
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:32 awight@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:770907{{!}}[beta] Disable improved template search (T286991, T302857)]] (duration: 00m 48s)
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22548 and previous config saved to /var/cache/conftool/dbconfig/20220315-133241-ladsgroup.json
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:24 godog: free up space on thanos-be2001 on /var/log/spool/rsyslog
* 13:31 awight@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:770906{{!}}[beta] Remove unused config overrides]] (duration: 00m 49s)
* 13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791717{{!}}thwikibooks: Enable babel categorize (T308378)]] (duration: 00m 52s)
* 13:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22547 and previous config saved to /var/cache/conftool/dbconfig/20220315-132857-marostegui.json
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22546 and previous config saved to /var/cache/conftool/dbconfig/20220315-132005-marostegui.json
* 12:43 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
* 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22545 and previous config saved to /var/cache/conftool/dbconfig/20220315-131736-ladsgroup.json
* 12:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 13:15 awight@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/TemplateWizard/resources/ext.TemplateWizard.SearchField.js: Backport: [[gerrit:770057{{!}}Fix copy-paste mistake in template search widget (T303524)]] (duration: 00m 49s)
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1095:3315', diff saved to https://phabricator.wikimedia.org/P22544 and previous config saved to /var/cache/conftool/dbconfig/20220315-131436-marostegui.json
* 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22543 and previous config saved to /var/cache/conftool/dbconfig/20220315-131352-marostegui.json
* 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22542 and previous config saved to /var/cache/conftool/dbconfig/20220315-131311-ladsgroup.json
* 12:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 12:21 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 49s)
* 13:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 12:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 48s)
* 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22541 and previous config saved to /var/cache/conftool/dbconfig/20220315-131303-ladsgroup.json
* 12:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 49s)
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 49s)
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:13 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating kcgwiki ([[phab:T305279|T305279]])
* 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22540 and previous config saved to /var/cache/conftool/dbconfig/20220315-130936-marostegui.json
* 12:11 urbanecm@deploy1002: Synchronized dblists: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 50s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:07 Amir1: removed 440 more corrupt rows in flaggedtemplates in dewiki ([[phab:T297189|T297189]])
* 12:10 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 49s)
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22539 and previous config saved to /var/cache/conftool/dbconfig/20220315-125847-marostegui.json
* 11:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1081.eqiad.wmnet with reason: [[phab:T308267|T308267]]
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22538 and previous config saved to /var/cache/conftool/dbconfig/20220315-125758-ladsgroup.json
* 11:59 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1081.eqiad.wmnet with reason: [[phab:T308267|T308267]]
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22537 and previous config saved to /var/cache/conftool/dbconfig/20220315-125431-marostegui.json
* 11:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P22536 and previous config saved to /var/cache/conftool/dbconfig/20220315-125228-marostegui.json
* 11:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
* 12:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 11:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
* 12:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 11:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
* 12:48 Amir1: removed 170 corrupt rows in flaggedtemplates in dewiki ([[phab:T297189|T297189]])
* 11:26 XioNoX: asw2-ulsfo fix MTU on 2 interfaces
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22535 and previous config saved to /var/cache/conftool/dbconfig/20220315-124342-marostegui.json
* 11:09 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes: Backport: [[gerrit:792126{{!}}RestrictionStore: Add support for templatelinks migration (T308207)]] (duration: 00m 54s)
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22534 and previous config saved to /var/cache/conftool/dbconfig/20220315-124253-ladsgroup.json
* 11:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22533 and previous config saved to /var/cache/conftool/dbconfig/20220315-123926-marostegui.json
* 11:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22532 and previous config saved to /var/cache/conftool/dbconfig/20220315-122748-ladsgroup.json
* 11:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:24 moritzm: updating Exim on mx2001 [[phab:T303738|T303738]]
* 11:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22531 and previous config saved to /var/cache/conftool/dbconfig/20220315-122421-marostegui.json
* 10:57 vgutierrez: test HAProxy 2.4.17 on cp4026 and cp4032
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22530 and previous config saved to /var/cache/conftool/dbconfig/20220315-121317-ladsgroup.json
* 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: Maintenance
* 10:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22529 and previous config saved to /var/cache/conftool/dbconfig/20220315-121309-ladsgroup.json
* 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22528 and previous config saved to /var/cache/conftool/dbconfig/20220315-115804-ladsgroup.json
* 08:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22527 and previous config saved to /var/cache/conftool/dbconfig/20220315-114259-ladsgroup.json
* 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22526 and previous config saved to /var/cache/conftool/dbconfig/20220315-112754-ladsgroup.json
* 07:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T298743|T298743]])', diff saved to https://phabricator.wikimedia.org/P22525 and previous config saved to /var/cache/conftool/dbconfig/20220315-112308-ladsgroup.json
* 07:58 urbanecm: UTC morning B&C window done
* 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 07:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e9a00e8}}: GrowthExperiments: Update campaigns configuration ([[phab:T305443|T305443]], [[phab:T305659|T305659]], [[phab:T307521|T307521]]) (duration: 00m 50s)
* 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 07:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: Maintenance
* 07:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dc82dfa8}}: ptwikinews: Enable extension MediaSearch ([[phab:T299872|T299872]]) (duration: 00m 48s)
* 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 07:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Maintenance
* 07:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Maintenance
* 07:44 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|57d4a9c}}: thwikibooks: Enable quiz extension ([[phab:T308377|T308377]]) (duration: 00m 48s)
* 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 07:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: Maintenance
* 07:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3e04f86}}: thwikibooks: Add more namespaces to wgNamespacesToBeSearchedDefault ([[phab:T308373|T308373]]) (duration: 00m 48s)
* 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 07:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22524 and previous config saved to /var/cache/conftool/dbconfig/20220315-110423-marostegui.json
* 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 07:36 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|67ce6ce}}: zhwikisource: Add NS100 to wgNamespacesToBeSearchedDefault ([[phab:T308393|T308393]]) (duration: 00m 50s)
* 11:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 07:18 dcausse: restarting blazegraph on wdqs1007 (BlazegraphFreeAllocatorsDecreasingRapidly)
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22523 and previous config saved to /var/cache/conftool/dbconfig/20220315-110416-marostegui.json
 
* 10:50 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
== 2022-05-15 ==
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22522 and previous config saved to /var/cache/conftool/dbconfig/20220315-104910-marostegui.json
* 21:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 07s)
* 10:49 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
* 21:46 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22521 and previous config saved to /var/cache/conftool/dbconfig/20220315-103405-marostegui.json
* 21:42 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 07s)
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22520 and previous config saved to /var/cache/conftool/dbconfig/20220315-101922-root.json
* 21:42 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22519 and previous config saved to /var/cache/conftool/dbconfig/20220315-101900-marostegui.json
* 21:39 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 08s)
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22518 and previous config saved to /var/cache/conftool/dbconfig/20220315-101449-root.json
* 21:39 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 10:13 Amir1: start of foreachwikiindblist all maintenance/refreshImageMetadata.php --force --verbose --mediatype=AUDIO --sleep 2 --oldimage ([[phab:T226311|T226311]])
* 21:30 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 08s)
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22517 and previous config saved to /var/cache/conftool/dbconfig/20220315-100418-root.json
* 21:30 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22516 and previous config saved to /var/cache/conftool/dbconfig/20220315-095945-root.json
* 21:14 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 08s)
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22515 and previous config saved to /var/cache/conftool/dbconfig/20220315-094914-root.json
* 21:14 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22514 and previous config saved to /var/cache/conftool/dbconfig/20220315-094441-root.json
 
* 09:38 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
== 2022-05-14 ==
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22513 and previous config saved to /var/cache/conftool/dbconfig/20220315-093410-root.json
* 08:34 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1172', diff saved to https://phabricator.wikimedia.org/P27830 and previous config saved to /var/cache/conftool/dbconfig/20220514-083421-jynus.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22512 and previous config saved to /var/cache/conftool/dbconfig/20220315-092937-root.json
* 00:53 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Server need to be downgraded to stretch, on monday
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22511 and previous config saved to /var/cache/conftool/dbconfig/20220315-091906-root.json
* 00:53 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Server need to be downgraded to stretch, on monday
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22510 and previous config saved to /var/cache/conftool/dbconfig/20220315-091850-root.json
 
* 09:14 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
== 2022-05-13 ==
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22509 and previous config saved to /var/cache/conftool/dbconfig/20220315-091433-root.json
* 23:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1007.eqiad.wmnet with reason: Upgrade turnilo
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22507 and previous config saved to /var/cache/conftool/dbconfig/20220315-090346-root.json
* 23:42 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1007.eqiad.wmnet with reason: Upgrade turnilo
* 09:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:14 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@bf60521]: Staging deployment of turnilo 1.35 (duration: 00m 08s)
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:13 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@bf60521]: Staging deployment of turnilo 1.35
* 08:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1003.wikimedia.org
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22506 and previous config saved to /var/cache/conftool/dbconfig/20220315-085929-root.json
* 17:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1003.wikimedia.org
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1004.wikimedia.org
* 08:57 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a85cf25] (eqiad): Switchover to eqiad tegola on eqiad env (duration: 01m 55s)
* 17:24 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 08:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a85cf25] (eqiad): Switchover to eqiad tegola on eqiad env
* 17:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudservices1004.wikimedia.org
* 08:53 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a85cf25] (codfw): Switchover to eqiad tegola on eqiad env (duration: 03m 22s)
* 17:24 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22505 and previous config saved to /var/cache/conftool/dbconfig/20220315-085214-marostegui.json
* 15:57 _joe_: uploading conftool 2.2.0 to buster, bullseye [[phab:T305824|T305824]] [[phab:T305582|T305582]] [[phab:T305607|T305607]] [[phab:T305638|T305638]] [[phab:T307905|T307905]] [[phab:T308100|T308100]]
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 12:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 12:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22504 and previous config saved to /var/cache/conftool/dbconfig/20220315-085206-marostegui.json
* 12:37 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1095:3315', diff saved to https://phabricator.wikimedia.org/P22503 and previous config saved to /var/cache/conftool/dbconfig/20220315-085026-marostegui.json
* 12:37 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 08:50 marostegui: dbmaint on s5@eqiad [[phab:T297189|T297189]]
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2140 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P27824 and previous config saved to /var/cache/conftool/dbconfig/20220513-121832-marostegui.json
* 08:49 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a85cf25] (codfw): Switchover to eqiad tegola on eqiad env
* 12:09 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22502 and previous config saved to /var/cache/conftool/dbconfig/20220315-084842-root.json
* 11:59 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22501 and previous config saved to /var/cache/conftool/dbconfig/20220315-084425-root.json
* 11:57 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1161.eqiad.wmnet with OS bullseye
* 11:47 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22500 and previous config saved to /var/cache/conftool/dbconfig/20220315-083925-marostegui.json
* 11:40 moritzm: installing idp-test1002 [[phab:T308214|T308214]]
* 08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 10:55 moritzm: installing idp-test2002 [[phab:T308214|T308214]]
* 08:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22499 and previous config saved to /var/cache/conftool/dbconfig/20220315-083917-marostegui.json
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22498 and previous config saved to /var/cache/conftool/dbconfig/20220315-083701-marostegui.json
* 10:18 vgutierrez: disable puppet on gerrit1001 to fix /etc/ssh/ssh_config
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22497 and previous config saved to /var/cache/conftool/dbconfig/20220315-083338-root.json
* 08:39 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
* 08:03 jynus: moving s2 database from db2101 to db2097 [[phab:T299920|T299920]]
* 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
* 07:59 moritzm: draining ganeti4002 [[phab:T307997|T307997]]
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22496 and previous config saved to /var/cache/conftool/dbconfig/20220315-082412-marostegui.json
* 07:52 XioNoX: add init7 transit in drmrs
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22495 and previous config saved to /var/cache/conftool/dbconfig/20220315-082401-root.json
* 07:39 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4001.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22493 and previous config saved to /var/cache/conftool/dbconfig/20220315-082157-marostegui.json
* 07:39 root@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4001.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22492 and previous config saved to /var/cache/conftool/dbconfig/20220315-081835-root.json
* 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4001.ulsfo.wmnet
* 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1161.eqiad.wmnet with OS bullseye
* 07:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22491 and previous config saved to /var/cache/conftool/dbconfig/20220315-080907-marostegui.json
* 07:18 Amir1: start of mwscript extensions/Echo/maintenance/removeOrphanedEvents.php --wiki=wikidatawiki --force ([[phab:T308084|T308084]])
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22490 and previous config saved to /var/cache/conftool/dbconfig/20220315-080857-root.json
* 02:14 ejegg: updated payments-wiki from {{Gerrit|8f46af9d}} to {{Gerrit|590fac28}}
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22489 and previous config saved to /var/cache/conftool/dbconfig/20220315-080651-marostegui.json
* 08:05 marostegui: dbmaint on s5@eqiad [[phab:T300473|T300473]]
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22488 and previous config saved to /var/cache/conftool/dbconfig/20220315-080329-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P22487 and previous config saved to /var/cache/conftool/dbconfig/20220315-080128-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22486 and previous config saved to /var/cache/conftool/dbconfig/20220315-075402-marostegui.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22485 and previous config saved to /var/cache/conftool/dbconfig/20220315-075353-root.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22484 and previous config saved to /var/cache/conftool/dbconfig/20220315-074825-root.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22483 and previous config saved to /var/cache/conftool/dbconfig/20220315-074650-root.json
* 07:43 elukey: restart kube-api server on ml-serve-ctrl2002 - 504 responses registered, corresponding to high custom resource definition requests
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22482 and previous config saved to /var/cache/conftool/dbconfig/20220315-073849-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22481 and previous config saved to /var/cache/conftool/dbconfig/20220315-073146-root.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22480 and previous config saved to /var/cache/conftool/dbconfig/20220315-072345-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22479 and previous config saved to /var/cache/conftool/dbconfig/20220315-071642-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22478 and previous config saved to /var/cache/conftool/dbconfig/20220315-070841-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22477 and previous config saved to /var/cache/conftool/dbconfig/20220315-070635-marostegui.json
* 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22476 and previous config saved to /var/cache/conftool/dbconfig/20220315-070138-root.json
* 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1166.eqiad.wmnet with OS bullseye
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22475 and previous config saved to /var/cache/conftool/dbconfig/20220315-065337-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22474 and previous config saved to /var/cache/conftool/dbconfig/20220315-064634-root.json
* 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
* 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22473 and previous config saved to /var/cache/conftool/dbconfig/20220315-063130-root.json
* 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1166.eqiad.wmnet with OS bullseye
* 06:26 marostegui: dbmaint on s3@eqiad [[phab:T300600|T300600]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P22472 and previous config saved to /var/cache/conftool/dbconfig/20220315-062543-marostegui.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22471 and previous config saved to /var/cache/conftool/dbconfig/20220315-061626-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22470 and previous config saved to /var/cache/conftool/dbconfig/20220315-061458-marostegui.json
* 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22469 and previous config saved to /var/cache/conftool/dbconfig/20220315-061450-marostegui.json
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22468 and previous config saved to /var/cache/conftool/dbconfig/20220315-055945-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22467 and previous config saved to /var/cache/conftool/dbconfig/20220315-054440-marostegui.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22466 and previous config saved to /var/cache/conftool/dbconfig/20220315-052935-marostegui.json
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22465 and previous config saved to /var/cache/conftool/dbconfig/20220315-013013-marostegui.json
* 01:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 01:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22464 and previous config saved to /var/cache/conftool/dbconfig/20220315-013000-marostegui.json
* 01:26 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/maintenance/populateGlobalEditCount.php: fix script bug gerrit 770058 (duration: 00m 50s)
* 01:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22463 and previous config saved to /var/cache/conftool/dbconfig/20220315-011455-marostegui.json
* 00:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22462 and previous config saved to /var/cache/conftool/dbconfig/20220315-005950-marostegui.json
* 00:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22461 and previous config saved to /var/cache/conftool/dbconfig/20220315-004445-marostegui.json
* 00:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)


== 2022-03-14 ==
== 2022-05-12 ==
* 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22460 and previous config saved to /var/cache/conftool/dbconfig/20220314-234430-marostegui.json
* 21:56 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@a2bdc3e]: (no justification provided) (duration: 02m 08s)
* 23:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 21:53 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@a2bdc3e]: (no justification provided)
* 23:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 21:43 robh: cp306[23] returned to service, cp306[45] coming down for firmware update via [[phab:T243167|T243167]]
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 robh: cp306[01] returned to service, cp306[23] coming down for firmware update via [[phab:T243167|T243167]]
* 23:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:59 brennen: utc late backport & config window closed
* 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:50 robh: resuming last 6 esams cp host firmware updates via [[phab:T243167|T243167]].  cp306[01] going offline
* 23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:50 Krinkle: krinkle@mwmaint1002$ mwscript refreshLinks.php --wiki commonswiki --category 'Media_needing_categories_requiring_human_attention' (approximately 2000 tiny pages)
* 22:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:28 ryankemper: [[phab:T301108|T301108]] `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs1009.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "moving away from legacy updater" --blazegraph_instance wikidata --without-lvs --task-id [[phab:T301108|T301108]]` on tmux `wdqs`
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
* 20:39 brennen@deploy1002: Finished scap: Backport for [[gerrit:791430]] viwiki: Enable "upload_by_url" for sysop (duration: 01m 36s)
* 22:16 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
* 20:37 brennen@deploy1002: Started scap: Backport for [[gerrit:791430]] viwiki: Enable "upload_by_url" for sysop
* 22:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 22:03 bking@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs-internal,name=eqiad
* 22:03 bking@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 22:03 inflatador: [[phab:T302494|T302494]] bking@puppetmaster1001 depooling eqiad in DNS-discovery for wdqs and wdqs-internal services
* 21:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:38 inflatador: [[phab:T302494|T302494]] bking@puppetmaster1001 conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=codfw
* 21:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:37 bking@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=codfw
* 21:36 inflatador: bking@cumin pooling codfw in DNS-discovery for wdqs and wdqs-internal services
* 21:31 sbassett: Deployed security fix for [[phab:T160800|T160800]]
* 21:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
* 21:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:54 urbanecm: UTC late B&C completed
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bca9c94c9d0bec83cb777bc474fde564c441349c}}: liwiktionary: Change timezone to CET/CEST ([[phab:T303734|T303734]]) (duration: 00m 49s)
* 20:45 ebernhardson@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CirrusSearch/profiles/SaneitizeProfiles.config.php: Backport: [[gerrit:770056{{!}}Cut saneitizer re-indexing rate in half (T302733)]] (duration: 00m 49s)
* 20:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
* 20:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:32 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791424{{!}}ruwiktionary: Add localized mobile wordmark (T308233)]] (duration: 00m 50s)
* 20:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:31 brennen@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-ru.svg: Config: [[gerrit:791424{{!}}ruwiktionary: Add localized mobile wordmark (T308233)]] (duration: 00m 49s)
* 20:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:785229]] Enable "upload_by_url" feature on zhwiki (duration: 01m 46s)
* 20:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:23 brennen@deploy1002: Started scap: Backport for [[gerrit:785229]] Enable "upload_by_url" feature on zhwiki
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 brennen@deploy1002: backport aborted:  (duration: 02m 05s)
* 20:17 brennen@deploy1002: prep aborted:  (duration: 00m 01s)
* 19:57 hashar: Restarting Gerrit
* 19:53 mutante: gitlab2001 - systemctl start backup-restore -  systemd[1]: Started GitLab Backup Restore. after gerrit:791410  for [[phab:T308089|T308089]]
* 18:57 jelto: restart gitlab2001
* 18:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:26 krinkle@deploy1002: Synchronized w/static.php: {{Gerrit|Ic0a5eae4f721a16403071d1b2136cf23d78e4fa9}} (duration: 00m 49s)
* 18:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4001.ulsfo
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22418 and previous config saved to /var/cache/conftool/dbconfig/20220314-131220-marostegui.json
* 13:12 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:787690{{!}}Enable CodeMirror colorblind-friendly palette (T306867)]] (duration: 00m 51s)
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:10 urbanecm@deploy1002: Synchronized wmf-config/wikitech.php: {{Gerrit|95f376a}}: wikitech: migrate wmf* to wmg* ([[phab:T45956|T45956]]) (duration: 00m 48s)
* 13:
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22417 and previous config saved to /var/cache/conftool/dbconfig/20220314-125839-marostegui.json
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22416 and previous config saved to /var/cache/conftool/dbconfig/20220314-124911-marostegui.json
* 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22415 and previous config saved to /var/cache/conftool/dbconfig/20220314-124902-marostegui.json
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22414 and previous config saved to /var/cache/conftool/dbconfig/20220314-123357-marostegui.json
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T298563|T298563]])', diff saved to https://phabricator.wikimedia.org/P22413 and previous config saved to /var/cache/conftool/dbconfig/20220314-121937-marostegui.json
* 12:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for
* 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:19 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 1: [[gerrit:769386{{!}}Enable SectionTranslation on Javanese, Tagalog, Mongolian, Telugu WPs (T298237)]] (duration: 00m 50s)
* 08:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1, s8) for reboot', diff saved to https://phabricator.wikimedia.org/P22256 and previous config saved to /var/cache/conftool/dbconfig/20220310-081244-marostegui.json
* 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22255 and previous config saved to /var/cache/conftool/dbconfig/20220310-081129-marostegui.json
* 08:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22254 and previous config saved to /var/cache/conftool/dbconfig/20220310-080718-marostegui.json
* 08:10 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790430{{!}}Revert "Revert "Set arwiki to read new in templatelinks migration""]] (duration: 00m 48s)
* 08:03 marostegui: Reboot dbproxy1017 1016 [[phab:T303174|
* 08:06 ladsgroup@deploy1002: Synchronized wmf-config: Config: [[gerrit:787698{{!}}Clean up unnecessary two-level setting]] (duration: 00m 48s)
* 08:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790618{{!}}Stop writing to revision_actor_temp table everywhere (T275246)]] (duration: 00m 48s)
* 08:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:00 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/api/ApiQueryLinks.php: Backport: [[gerrit:790427{{!}}api: Add support for linksmigration in ApiQueryLinks (T304780)]] (duration: 00m 49s)
* 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:58 moritzm: installing qemu security updates
* 07:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:55 moritzm: failover ganeti master for drmrs/B12 to ganeti6001
* 07:55 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:787698{{!}}Clean up unnecessary two-level setting]] (duration: 00m 49s)
* 07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 07:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:48 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:786289{{!}}Watch for mapdata cache misses in production (T304813 T300712)]] (duration: 00m 50s)
* 07:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 07:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 07:23 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 07:17 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Naray-ctr out of all services on: 1223 hosts
* 07:16 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Naray-ctr out of all services on: 1223 hosts
* 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Naray-ctr out of all services on: 531 hosts
* 07:15 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Naray-ctr out of all services on: 531 hosts
* 06:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:56 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 06:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 06:56 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 06:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast3005.wikimedia.org
* 06:47 jmm@cumin2002: START - Cookbook sre.ganeti


== 2022-03-09 ==
== 2022-05-04 ==
* 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P27565 and previous config saved to /var/cache/conftool/dbconfig/20220504-235020-ladsgroup.json
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27564 and previous config saved to /var/cache/conftool/dbconfig/20220504-235000-ladsgroup.json
* 23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:09 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25  refs [[phab:T300201|T300201]] (duration: 00m 49s)
* 23:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 23:08 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 23:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 23:08 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27563 and previous config saved
* 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 10:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27429 and previous config saved to /var/cache/conftool/dbconfig/20220504-102303-ladsgroup.json
* 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 10:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 10:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
* 10:16 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22192 and previous config saved to /var/cache/conftool/dbconfig/20220309-101610-marostegui.json
* 10:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:07 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2044.codfw.wmnet with OS bullseye
* 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 10:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:08 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:769400{{!}}reenable DPL on nowikimedia]] (duration: 00m 51s)
* 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22191 and previous config saved to /var/cache/conftool/dbconfig/20220309-100036-marostegui.json
* 10:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1008.eqiad.wmnet
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2147', diff saved to https://phabricator.wikimedia.org/P22190 and previous config saved to /var/cache/conftool/dbconfig/20220309-094704-marostegui.json
* 10:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:45 marostegui: dbmaint on s7@eqiad [[phab:T298295|T298295]]
* 09:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1008.eqiad.wmnet
* 09:45 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22189 and previous config saved to /var/cache/conftool/dbconfig/20220309-094501-marostegui.json
* 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27427 and previous config saved to /var/cache/conftool/dbconfig/20220504-094900-ladsgroup.json
* 09:31 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22188 and previous config saved to /var/cache/conftool/dbconfig/20220309-093119-marostegui.json
* 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:30 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 09:30 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27426 and previous config saved to /var/cache/conftool/dbconfig/20220504-094852-ladsgroup.json
* 09:27 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22187 and previous config saved to /var/cache/conftool/dbconfig/20220309-092731-marostegui.json
* 09:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:26 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2044.codfw.wmnet with reason: host reimage
* 09:26 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:36 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2044.codfw.wmnet with reason: host reimage
* 09:23 marostegui: dbmaint on s2@eqiad [[phab:T298295|T298295]]
* 09:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:18 marostegui: dbmaint on s1@eqiad [[phab:T298295|T298295]]
* 09:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:16 marostegui: dbmaint on s4@eqiad [[phab:T298295|T298295]]
* 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27425 and previous config saved to /var/cache/conftool/dbconfig/20220504-093347-ladsgroup.json
* 09:07 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27424 and previous config saved to /var/cache/conftool/dbconfig/20220504-092833-ladsgroup.json
* 09:07 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 09:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22186 and previous config saved to /var/cache/conftool/dbconfig/20220309-090737-marostegui.json
* 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 09:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 09:22 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
* 08:53 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
* 09:20 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2044.codfw.wmnet with OS bullseye
* 08:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22184 and previous config saved to /var/cache/conftool/dbconfig/20220309-085201-marostegui.json
* 09:19 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.39.0-wmf.10  refs [[phab:T305216|T305216]]
* 08:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 09:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
* 08:49 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
* 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27423 and previous config saved to /var/cache/conftool/dbconfig/20220504-091842-ladsgroup.json
* 08:46 XioNoX: Redirect one of Microsoft's range to codfw - [[phab:T282861|T282861]]
* 09:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:43 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
* 09:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 09:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22183 and previous config saved to /var/cache/conftool/dbconfig/20220309-083626-marostegui.json
* 09:13 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.10  refs [[phab:T305216|T305216]] (duration: 02m 00s)
* 08:21 marostegui: dbmaint on s3@eqiad [[phab:T300380|T300380]]
* 09:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
* 08:20 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22182 and previous config saved to /var/cache/conftool/dbconfig/20220309-082051-marostegui.json
* 09:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.10  refs [[phab:T305216|T305216]]
* 08:11 marostegui: dbmaint on s7@eqiad [[phab:T300380|T300380]]
* 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:03 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1123 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22181 and previous config saved to /var/cache/conftool/dbconfig/20220309-080307-marostegui.json
* 09:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
* 08:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 40%: After schema change', diff saved to https://phabricator.wikimedia.org/P22180 and previous config saved to /var/cache/conftool/dbconfig/20220309-075704-root.json
* 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27422 and previous config saved to /var/cache/conftool/dbconfig/20220504-090337-ladsgroup.json
* 07:55 marostegui: dbmaint on s2@eqiad [[phab:T300380|T300380]]
* 09:00 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.10 - [[phab:T305216|T305216]] [[phab:T307513|T307513]]
* 07:49 marostegui: dbmaint on s8@eqiad [[phab:T300380|T300380]]
* 08:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
* 07:49 marostegui: dbmaint on s4@eqiad [[phab:T300380|T300380]]
* 08:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1002.eqiad.wmnet
* 07:42 marostegui: dbmaint on s1@eqiad [[phab:T300380|T300380]]
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:42 marostegui: dbmaint on s6@eqiad [[phab:T300380|T300380]]
* 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27421 and previous config saved to /var/cache/conftool/dbconfig/20220504-085655-ladsgroup.json
* 07:42 marostegui: dbmaint on s5@eqiad [[phab:T300380|T300380]]
* 08:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 07:42 marostegui: dbmaint on s5 [[phab:T300380|T300380]]
* 08:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 07:42 marostegui: dbmaint on s6 [[phab:T300380|T300380]]
* 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27420 and previous config saved to /var/cache/conftool/dbconfig/20220504-085647-ladsgroup.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22179 and previous config saved to /var/cache/conftool/dbconfig/20220309-074200-root.json
* 08:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1002.eqiad.wmnet
* 07:41 marostegui: dbmaint on s1 [[phab:T300380|T300380]]
* 08:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1002.eqiad.wmnet
* 07:41 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22178 and previous config saved to /var/cache/conftool/dbconfig/20220309-074107-root.json
* 08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:34 marostegui: dbmaint on s7@eqiad [[phab:T300775|T300775]]
* 08:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:33 marostegui: dbmaint on db1123 s3@eqiad [[phab:T300600|T300600]]
* 08:45 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1002.eqiad.wmnet
* 07:31 elukey: manually sync pcc facts following https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Manually_update_production
* 08:43 taavi@deploy1002: Synchronized php-1.39.0-wmf.9/extensions/TitleBlacklist/extension.json: Backport: [[gerrit:789095{{!}}wmf.9 HACK: add forward class alias for TitleBlacklistEntry too (T307513)]] (duration: 00m 49s)
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 15%: After schema change', diff saved to https://phabricator.wikimedia.org/P22177 and previous config saved to /var/cache/conftool/dbconfig/20220309-072656-root.json
* 08:43 taavi@deploy1002: Synchronized php-1.39.0-wmf.9/extensions/TitleBlacklist/includes/TitleBlacklistEntry.php: Backport: [[gerrit:789095{{!}}wmf.9 HACK: add forward class alias for TitleBlacklistEntry too (T307513)]] (duration: 00m 50s)
* 07:25 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22176 and previous config saved to /var/cache/conftool/dbconfig/20220309-072540-root.json
* 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27417 and previous config saved to /var/cache/conftool/dbconfig/20220504-084142-ladsgroup.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P22175 and previous config saved to /var/cache/conftool/dbconfig/20220309-071153-root.json
* 08:41 hashar@deploy1002: deploy-promote aborted:  (duration: 00m 11s)
* 07:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22174 and previous config saved to /var/cache/conftool/dbconfig/20220309-071014-root.json
* 08:39 taavi@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/TitleBlacklist: Backport: [[gerrit:788855{{!}}Add a class_alias for TitleBlacklistEntry too (T307513)]] (duration: 00m 50s)
* 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1123.eqiad.wmnet with OS bullseye
* 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
* 08:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22173 and previous config saved to /var/cache/conftool/dbconfig/20220309-065447-root.json
* 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
* 08:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1123.eqiad.wmnet with OS bullseye
* 08:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22172 and previous config saved to /var/cache/conftool/dbconfig/20220309-062010-marostegui.json
* 08:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:19 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 08:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:19 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 08:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.10 - [[phab:T305216|T305216]] [[phab:T307513|T307513]]
* 06:06 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 08:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:06 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27416 and previous config saved to /var/cache/conftool/dbconfig/20220504-082637-ladsgroup.json
* 01:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22171 and previous config saved to /var/cache/conftool/dbconfig/20220309-014831-marostegui.json
* 08:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:32 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22170 and previous config saved to /var/cache/conftool/dbconfig/20220309-013256-marostegui.json
* 08:22 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.10  refs [[phab:T305216|T305216]]
* 01:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22169 and previous config saved to /var/cache/conftool/dbconfig/20220309-011721-marostegui.json
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22168 and previous config saved to /var/cache/conftool/dbconfig/20220309-010146-marostegui.json
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:53 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22167 and previous config saved to /var/cache/conftool/dbconfig/20220309-005325-marostegui.json
* 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:52 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 08:15 hashar@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/CentralAuth/includes/Special/SpecialGlobalGroupMembership.php: SpecialGlobalGroupMembership: do not call core hooks - [[phab:T307518|T307518]] (duration: 01m 09s)
* 00:52 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27415 and previous config saved to /var/cache/conftool/dbconfig/20220504-081131-ladsgroup.json
* 00:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22166 and previous config saved to /var/cache/conftool/dbconfig/20220309-005245-marostegui.json
* 08:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 00:37 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22165 and previous config saved to /var/cache/conftool/dbconfig/20220309-003710-marostegui.json
* 08:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 00:21 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22164 and previous config saved to /var/cache/conftool/dbconfig/20220309-002135-marostegui.json
* 08:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 00:06 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22163 and previous config saved to /var/cache/conftool/dbconfig/20220309-000600-marostegui.json
* 08:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 00:02 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22162 and previous config saved to /var/cache/conftool/dbconfig/20220309-000250-marostegui.json
* 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27414 and previous config saved to /var/cache/conftool/dbconfig/20220504-081015-ladsgroup.json
* 00:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27413 and previous config saved to /var/cache/conftool/dbconfig/20220504-080509-ladsgroup.json
* 00:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 08:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 00:00 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 00:00 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 00:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22161 and previous config saved to /var/cache/conftool/dbconfig/20220309-000025-marostegui.json
* 08:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 07:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 07:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 07:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P27412 and previous config saved to /var/cache/conftool/dbconfig/20220504-075510-ladsgroup.json
* 07:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P27411 and previous config saved to /var/cache/conftool/dbconfig/20220504-074005-ladsgroup.json
* 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27410 and previous config saved to /var/cache/conftool/dbconfig/20220504-072500-ladsgroup.json
* 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27409 and previous config saved to /var/cache/conftool/dbconfig/20220504-071035-ladsgroup.json
* 07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27408 and previous config saved to /var/cache/conftool/dbconfig/20220504-071027-ladsgroup.json
* 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27407 and previous config saved to /var/cache/conftool/dbconfig/20220504-070920-ladsgroup.json
* 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27406 and previous config saved to /var/cache/conftool/dbconfig/20220504-070647-ladsgroup.json
* 07:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 07:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27405 and previous config saved to /var/cache/conftool/dbconfig/20220504-070639-ladsgroup.json
* 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P27404 and previous config saved to /var/cache/conftool/dbconfig/20220504-065522-ladsgroup.json
* 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27403 and previous config saved to /var/cache/conftool/dbconfig/20220504-065133-ladsgroup.json
* 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P27402 and previous config saved to /var/cache/conftool/dbconfig/20220504-064017-ladsgroup.json
* 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27401 and previous config saved to /var/cache/conftool/dbconfig/20220504-063628-ladsgroup.json
* 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27400 and previous config saved to /var/cache/conftool/dbconfig/20220504-062512-ladsgroup.json
* 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27399 and previous config saved to /var/cache/conftool/dbconfig/20220504-062123-ladsgroup.json
* 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27398 and previous config saved to /var/cache/conftool/dbconfig/20220504-061441-ladsgroup.json
* 06:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 06:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27397 and previous config saved to /var/cache/conftool/dbconfig/20220504-061346-ladsgroup.json
* 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27396 and previous config saved to /var/cache/conftool/dbconfig/20220504-055841-ladsgroup.json
* 05:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27395 and previous config saved to /var/cache/conftool/dbconfig/20220504-054336-ladsgroup.json
* 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27394 and previous config saved to /var/cache/conftool/dbconfig/20220504-053721-ladsgroup.json
* 05:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27393 and previous config saved to /var/cache/conftool/dbconfig/20220504-053713-ladsgroup.json
* 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27392 and previous config saved to /var/cache/conftool/dbconfig/20220504-052831-ladsgroup.json
* 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P27391 and previous config saved to /var/cache/conftool/dbconfig/20220504-052208-ladsgroup.json
* 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27390 and previous config saved to /var/cache/conftool/dbconfig/20220504-052140-ladsgroup.json
* 05:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 05:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P27389 and previous config saved to /var/cache/conftool/dbconfig/20220504-050703-ladsgroup.json
* 04:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27388 and previous config saved to /var/cache/conftool/dbconfig/20220504-045158-ladsgroup.json
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27386 and previous config saved to /var/cache/conftool/dbconfig/20220504-043343-ladsgroup.json
* 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 04:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 04:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 04:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 04:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 04:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 04:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27385 and previous config saved to /var/cache/conftool/dbconfig/20220504-042253-ladsgroup.json
* 04:08 Amir1: killed refresh job of rowiki ([[phab:T299021|T299021]])
* 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27384 and previous config saved to /var/cache/conftool/dbconfig/20220504-040747-ladsgroup.json
* 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27383 and previous config saved to /var/cache/conftool/dbconfig/20220504-040011-ladsgroup.json
* 04:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 04:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27382 and previous config saved to /var/cache/conftool/dbconfig/20220504-035242-ladsgroup.json
* 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27381 and previous config saved to /var/cache/conftool/dbconfig/20220504-033737-ladsgroup.json
* 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27380 and previous config saved to /var/cache/conftool/dbconfig/20220504-033012-ladsgroup.json
* 03:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 03:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27379 and previous config saved to /var/cache/conftool/dbconfig/20220504-033004-ladsgroup.json
* 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27378 and previous config saved to /var/cache/conftool/dbconfig/20220504-031459-ladsgroup.json
* 03:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5002.eqsin.wmnet with OS buster
* 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27377 and previous config saved to /var/cache/conftool/dbconfig/20220504-025954-ladsgroup.json
* 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27376 and previous config saved to /var/cache/conftool/dbconfig/20220504-024449-ladsgroup.json
* 02:41 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5002.eqsin.wmnet with OS buster
* 02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27375 and previous config saved to /var/cache/conftool/dbconfig/20220504-023608-ladsgroup.json
* 02:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 02:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27374 and previous config saved to /var/cache/conftool/dbconfig/20220504-023600-ladsgroup.json
* 02:32 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5002.eqsin.wmnet with OS buster
* 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27373 and previous config saved to /var/cache/conftool/dbconfig/20220504-022055-ladsgroup.json
* 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27372 and previous config saved to /var/cache/conftool/dbconfig/20220504-020549-ladsgroup.json
* 02:03 dpifke@deploy1002: Finished deploy [performance/coal@2a20d5d]: (no justification provided) (duration: 00m 06s)
* 02:03 dpifke@deploy1002: Started deploy [performance/coal@2a20d5d]: (no justification provided)
* 01:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5002.eqsin.wmnet with OS buster
* 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27371 and previous config saved to /var/cache/conftool/dbconfig/20220504-015044-ladsgroup.json
* 01:34 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp5002.eqsin.wmnet
* 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet
* 01:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27370 and previous config saved to /var/cache/conftool/dbconfig/20220504-010507-ladsgroup.json
* 01:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 01:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 01:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27369 and previous config saved to /var/cache/conftool/dbconfig/20220504-010459-ladsgroup.json
* 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 00:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27368 and previous config saved to /var/cache/conftool/dbconfig/20220504-004954-ladsgroup.json
* 00:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5002.eqsin.wmnet,service=ats-tls
* 00:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5002.eqsin.wmnet,service=varnish-fe
* 00:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5002.eqsin.wmnet,service=ats-be
* 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27367 and previous config saved to /var/cache/conftool/dbconfig/20220504-003449-ladsgroup.json
* 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27366 and previous config saved to /var/cache/conftool/dbconfig/20220504-001944-ladsgroup.json
* 00:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27365 and previous config saved to /var/cache/conftool/dbconfig/20220504-001326-ladsgroup.json
* 00:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: [[phab:T307525|T307525]]', diff saved to https://phabricator.wikimedia.org/P27364 and previous config saved to /var/cache/conftool/dbconfig/20220504-001205-ladsgroup.json
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2022-03-08 ==
== 2022-05-03 ==
* 23:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22160 and previous config saved to /var/cache/conftool/dbconfig/20220308-234450-marostegui.json
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: [[phab:T307525|T307525]]', diff saved to https://phabricator.wikimedia.org/P27363 and previous config saved to /var/cache/conftool/dbconfig/20220503-235701-ladsgroup.json
* 23:29 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22159 and previous config saved to /var/cache/conftool/dbconfig/20220308-232915-marostegui.json
* 23:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:13 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22158 and previous config saved to /var/cache/conftool/dbconfig/20220308-231340-marostegui.json
* 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:10 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22157 and previous config saved to /var/cache/conftool/dbconfig/20220308-231028-marostegui.json
* 23:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:09 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 23:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:09 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 23:50 ladsgroup@cumin1001: END (
* 23:09 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22156 and previous config saved to /var/cache/conftool/dbconfig/20220308-230949-marostegui.
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:57 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@1c598f5]: (no justification provided) (duration: 00m 04s)
* 01:57 ebysans@deploy1002: Started deploy [airflow-dags/analytics@1c598f5]: (no justification provided)
* 01:32 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@1c598f5]: (no justification provided) (duration: 00m 08s)
* 01:31 ebysans@deploy1002: Started deploy [airflow-dags/analytics@1c598f5]: (no justification provided)
* 01:22 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@21af07c]: (no justification provided) (duration: 00m 07s)
* 01:22 ebysans@deploy1002: Started deploy [airflow-dags/analytics@21af07c]: (no justification provided)
* 01:11 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c47e886]: (no justification provided) (duration: 00m 04s)
* 01:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c47e886]: (no justification provided)
* 01:07 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c47e886]: (no justification provided) (duration: 00m 08s)
* 01:07 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c47e886]: (no justification provided)
* 00:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c8a753b]: (no justification provided) (duration: 00m 07s)
* 00:34 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c8a753b]: (no justification provided)
* 00:08 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b5f7840]: (no justification provided) (duration: 00m 08s)
* 00:08 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b5f7840]: (no justification provided)
 
== 2022-03-07 ==
* 23:50 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx2001.wikimedia.org with reason: reboot
* 23:50 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mx2001.wikimedia.org with reason: reboot
* 23:49 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx1001.wikimedia.org with reason: reboot
* 23:49 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mx1001.wikimedia.org with reason: reboot
* 23:40 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
* 23:40 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
* 22:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1003.wikimedia.org with OS bullseye
* 22:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
* 22:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
* 22:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
* 22:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
* 22:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
* 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
* 21:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
* 21:38 urbanecm: UTC late B&C window done
* 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:37 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: {{Gerrit|eac551c}}: Fix language alert regression ([[phab:T302018|T302018]]) (duration: 00m 50s)
* 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:22 eileen: config {{Gerrit|aa7dcd88}} -> {{Gerrit|16fa8e1c}}
* 20:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
* 20:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
* 20:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
* 19:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
* 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22016 and previous config saved to /var/cache/conftool/dbconfig/20220307-181310-marostegui.json
* 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22015 and previous config saved to /var/cache/conftool/dbconfig/20220307-175805-marostegui.json
* 17:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet
* 17:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
* 17:47 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestage2002.codfw.wmnet
* 17:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
* 17:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22014 and previous config saved to /var/cache/conftool/dbconfig/20220307-174300-marostegui.json
* 17:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
* 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1004.wikimedia.org with OS bullseye
* 17:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1022.eqiad.wmnet
* 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22013 and previous config saved to /var/cache/conftool/dbconfig/20220307-172755-marostegui.json
* 17:24 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1022.eqiad.wmnet
* 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22012 and previous config saved to /var/cache/conftool/dbconfig/20220307-172134-marostegui.json
* 17:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 17:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22011 and previous config saved to /var/cache/conftool/dbconfig/20220307-172126-marostegui.json
* 17:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see [[phab:T303043|T303043]]
* 17:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see [[phab:T303043|T303043]]
* 17:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
* 17:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3058.esams.wmnet with OS buster
* 17:07 vgutierrez: pool cp3058 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 17:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
* 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22010 and previous config saved to /var/cache/conftool/dbconfig/20220307-170622-marostegui.json
* 17:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
* 16:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
* 16:58 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
* 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
* 16:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1004.wikimedia.org with OS bullseye
* 16:52 vgutierrez: depool cp5004 - [[phab:T303043|T303043]]
* 16:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
* 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22009 and previous config saved to /var/cache/conftool/dbconfig/20220307-165117-marostegui.json
* 16:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
* 16:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
* 16:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
* 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
* 16:45 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
* 16:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
* 16:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
* 16:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
* 16:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5010.eqsin.wmnet with OS buster
* 16:41 vgutierrez: pool cp5010 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 16:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
* 16:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22008 and previous config saved to /var/cache/conftool/dbconfig/20220307-163612-marostegui.json
* 16:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
* 16:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
* 16:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
* 16:29 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host graphite2003.codfw.wmnet
* 16:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
* 16:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
* 16:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22007 and previous config saved to /var/cache/conftool/dbconfig/20220307-162821-marostegui.json
* 16:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 16:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
* 16:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
* 16:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
* 16:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
* 16:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
* 16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22006 and previous config saved to /var/cache/conftool/dbconfig/20220307-162157-marostegui.json
* 16:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
* 16:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
* 16:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
* 16:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3058.esams.wmnet with OS buster
* 16:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
* 16:17 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
* 16:16 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 16:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
* 16:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 16:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
* 16:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
* 16:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 16:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
* 16:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
* 16:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
* 16:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
* 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22005 and previous config saved to /var/cache/conftool/dbconfig/20220307-160650-marostegui.json
* 16:06 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 16:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
* 16:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
* 16:04 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
* 16:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
* 16:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
* 16:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 16:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
* 15:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
* 15:58 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
* 15:56 jayme: eqiad: kubectl -n istio-system delete po istiod-69d679d8b5-hm64j - [[phab:T303184|T303184]]
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22004 and previous config saved to /var/cache/conftool/dbconfig/20220307-155146-marostegui.json
* 15:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5010.eqsin.wmnet with OS buster
* 15:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see [[phab:T303183|T303183]]
* 15:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see [[phab:T303183|T303183]]
* 15:38 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1085.eqiad.wmnet with OS buster
* 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22003 and previous config saved to /var/cache/conftool/dbconfig/20220307-153641-marostegui.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22002 and previous config saved to /var/cache/conftool/dbconfig/20220307-153357-marostegui.json
* 15:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 15:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P22001 and previous config saved to /var/cache/conftool/dbconfig/20220307-153343-marostegui.json
* 15:20 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@7642d65]: (no justification provided) (duration: 00m 07s)
* 15:20 ntsako@deploy1002: Started deploy [airflow-dags/analytics@7642d65]: (no justification provided)
* 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22000 and previous config saved to /var/cache/conftool/dbconfig/20220307-151929-root.json
* 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
* 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21999 and previous config saved to /var/cache/conftool/dbconfig/20220307-151839-marostegui.json
* 15:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
* 15:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
* 15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
* 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
* 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
* 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
* 15:09 ntsako@deploy1002: Finished deploy [airflow-dags/analytics_test@7642d65]: (no justification provided) (duration: 00m 09s)
* 15:09 ntsako@deploy1002: Started deploy [airflow-dags/analytics_test@7642d65]: (no justification provided)
* 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
* 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
* 15:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 15:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21998 and previous config saved to /var/cache/conftool/dbconfig/20220307-150426-root.json
* 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
* 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
* 15:03 vgutierrez: pool cp4030 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21997 and previous config saved to /var/cache/conftool/dbconfig/20220307-150334-marostegui.json
* 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
* 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
* 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
* 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
* 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 15:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4030.ulsfo.wmnet with OS buster
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
* 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
* 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
* 14:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 14:56 vgutierrez: depool cp1085
* 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21996 and previous config saved to /var/cache/conftool/dbconfig/20220307-144922-root.json
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21995 and previous config saved to /var/cache/conftool/dbconfig/20220307-144829-marostegui.json
* 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 14:45 vgutierrez: pool cp1085 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 14:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
* 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 14:37 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|f50c4746c5fa733929b80b036eef4eee84cf17d1}}: Revert "Change temporary logo for slwiki" ([[phab:T302661|T302661]]; 2/2) (duration: 00m 48s)
* 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:36 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|f50c4746c5fa733929b80b036eef4eee84cf17d1}}: Revert "Change temporary logo for slwiki" ([[phab:T302661|T302661]]; 1/2) (duration: 00m 49s)
* 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:35 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@46d88a2]: (no justification provided) (duration: 00m 04s)
* 14:35 ntsako@deploy1002: Started deploy [airflow-dags/analytics@46d88a2]: (no justification provided)
* 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 14:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
* 14:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21994 and previous config saved to /var/cache/conftool/dbconfig/20220307-143419-root.json
* 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1003.eqiad.wmnet
* 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21993 and previous config saved to /var/cache/conftool/dbconfig/20220307-143229-ladsgroup.json
* 14:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
* 14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
* 14:30 moritzm: rebooting etherpad1003 (running etherpad1003) for kernel update
* 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
* 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 14:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
* 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
* 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21992 and previous config saved to /var/cache/conftool/dbconfig/20220307-141915-root.json
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21991 and previous config saved to /var/cache/conftool/dbconfig/20220307-141911-root.json
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|64b128459f04514cd0093745d7a83166555449b2}}: Enable reply tool by default on enwiki ([[phab:T296645|T296645]]) (duration: 00m 49s)
* 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21990 and previous config saved to /var/cache/conftool/dbconfig/20220307-141724-ladsgroup.json
* 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8f20ec9f5bd7f507580d8e8860116e3b1842ac9a}}: fawiki: Disable creating community books and remove "Create a book" link from sidebar ([[phab:T303173|T303173]]) (duration: 00m 49s)
* 14:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4030.ulsfo.wmnet with OS buster
* 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 14:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1085.eqiad.wmnet with OS buster
* 14:08 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|8619f5933966071cdb39097a3e0d38fdead40b66}}: etwikiquote: Update logo ([[phab:T302683|T302683]]; 3/3) (duration: 00m 49s)
* 14:07 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|8619f5933966071cdb39097a3e0d38fdead40b66}}: etwikiquote: Update logo ([[phab:T302683|T302683]]; 2/3) (duration: 00m 49s)
* 14:07 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/etwikiquote.png ([[phab:T302683|T302683]])
* 14:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|8619f5933966071cdb39097a3e0d38fdead40b66}}: etwikiquote: Update logo ([[phab:T302683|T302683]]; 1/3) (duration: 00m 50s)
* 14:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21989 and previous config saved to /var/cache/conftool/dbconfig/20220307-140408-root.json
* 14:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS buster
* 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21988 and previous config saved to /var/cache/conftool/dbconfig/20220307-140219-ladsgroup.json
* 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
* 14:00 kormat: removing cumin2001 grants from all db sections [[phab:T276589|T276589]]
* 14:00 vgutierrez: pool cp2037 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
* 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
* 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21987 and previous config saved to /var/cache/conftool/dbconfig/20220307-135614-ladsgroup.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21986 and previous config saved to /var/cache/conftool/dbconfig/20220307-134904-root.json
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21985 and previous config saved to /var/cache/conftool/dbconfig/20220307-134848-marostegui.json
* 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21984 and previous config saved to /var/cache/conftool/dbconfig/20220307-134840-marostegui.json
* 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21983 and previous config saved to /var/cache/conftool/dbconfig/20220307-134715-ladsgroup.json
* 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 07m 17s)
* 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21982 and previous config saved to /var/cache/conftool/dbconfig/20220307-134109-ladsgroup.json
* 13:39 aqu@deploy1002: Started deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
* 13:39 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 00m 08s)
* 13:39 aqu@deploy1002: Started deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
* 13:37 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 25m 04s)
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21981 and previous config saved to /var/cache/conftool/dbconfig/20220307-133400-root.json
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21980 and previous config saved to /var/cache/conftool/dbconfig/20220307-133335-marostegui.json
* 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21979 and previous config saved to /var/cache/conftool/dbconfig/20220307-132605-ladsgroup.json
* 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1142.eqiad.wmnet with OS bullseye
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21978 and previous config saved to /var/cache/conftool/dbconfig/20220307-131857-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21977 and previous config saved to /var/cache/conftool/dbconfig/20220307-131830-marostegui.json
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 (s5,s6)', diff saved to https://phabricator.wikimedia.org/P21976 and previous config saved to /var/cache/conftool/dbconfig/20220307-131606-marostegui.json
* 13:12 aqu@deploy1002: Started deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
* 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21975 and previous config saved to /var/cache/conftool/dbconfig/20220307-131100-ladsgroup.json
* 13:09 aqu_: About to deploy analytics/refinery - Migrate wikidata/item_page_link/weekly from Oozie to Airflow
* 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
* 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21974 and previous config saved to /var/cache/conftool/dbconfig/20220307-130520-ladsgroup.json
* 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21973 and previous config saved to /var/cache/conftool/dbconfig/20220307-130512-ladsgroup.json
* 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21972 and previous config saved to /var/cache/conftool/dbconfig/20220307-130326-marostegui.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21971 and previous config saved to /var/cache/conftool/dbconfig/20220307-125540-marostegui.json
* 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21970 and previous config saved to /var/cache/conftool/dbconfig/20220307-125532-marostegui.json
* 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1142.eqiad.wmnet with OS bullseye
* 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21969 and previous config saved to /var/cache/conftool/dbconfig/20220307-125007-ladsgroup.json
* 12:49 aqu@deploy1002: Finished deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly (duration: 00m 07s)
* 12:49 aqu@deploy1002: Started deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly
* 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21968 and previous config saved to /var/cache/conftool/dbconfig/20220307-124815-ladsgroup.json
* 12:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 12:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
* 12:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21967 and previous config saved to /var/cache/conftool/dbconfig/20220307-124028-marostegui.json
* 12:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
* 12:37 XioNoX: restart cr1-drmrs for software upgrade
* 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21966 and previous config saved to /var/cache/conftool/dbconfig/20220307-123503-ladsgroup.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21965 and previous config saved to /var/cache/conftool/dbconfig/20220307-122523-marostegui.json
* 12:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS buster
* 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21964 and previous config saved to /var/cache/conftool/dbconfig/20220307-121958-ladsgroup.json
* 12:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3060.esams.wmnet with OS buster
* 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21963 and previous config saved to /var/cache/conftool/dbconfig/20220307-121443-ladsgroup.json
* 12:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 12:13 vgutierrez: pool cp3060 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P21962 and previous config saved to /var/cache/conftool/dbconfig/20220307-121122-marostegui.json
* 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21961 and previous config saved to /var/cache/conftool/dbconfig/20220307-121018-marostegui.json
* 12:10 XioNoX: reboot cr2-drmrs for software upgrade
* 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21960 and previous config saved to /var/cache/conftool/dbconfig/20220307-120821-ladsgroup.json
* 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21959 and previous config saved to /var/cache/conftool/dbconfig/20220307-120722-ladsgroup.json
* 12:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 12:06 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21958 and previous config saved to /var/cache/conftool/dbconfig/20220307-120532-marostegui.json
* 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 12:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5016.eqsin.wmnet with OS buster
* 12:03 vgutierrez: pool cp5016 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
* 11:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21957 and previous config saved to /var/cache/conftool/dbconfig/20220307-115337-marostegui.json
* 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21956 and previous config saved to /var/cache/conftool/dbconfig/20220307-115316-ladsgroup.json
* 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21955 and previous config saved to /var/cache/conftool/dbconfig/20220307-115217-ladsgroup.json
* 11:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
* 11:45 XioNoX: remove MTU1400 on drmrs GTT links
* 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21954 and previous config saved to /var/cache/conftool/dbconfig/20220307-113833-marostegui.json
* 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21953 and previous config saved to /var/cache/conftool/dbconfig/20220307-113811-ladsgroup.json
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21952 and previous config saved to /var/cache/conftool/dbconfig/20220307-113712-ladsgroup.json
* 11:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
* 11:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 11:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21951 and previous config saved to /var/cache/conftool/dbconfig/20220307-112328-marostegui.json
* 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21950 and previous config saved to /var/cache/conftool/dbconfig/20220307-112307-ladsgroup.json
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21949 and previous config saved to /var/cache/conftool/dbconfig/20220307-112207-ladsgroup.json
* 11:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3060.esams.wmnet with OS buster
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21948 and previous config saved to /var/cache/conftool/dbconfig/20220307-111834-root.json
* 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21947 and previous config saved to /var/cache/conftool/dbconfig/20220307-111816-ladsgroup.json
* 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21946 and previous config saved to /var/cache/conftool/dbconfig/20220307-111809-ladsgroup.json
* 11:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4036.ulsfo.wmnet with OS buster
* 11:12 vgutierrez: pool cp4036 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5016.eqsin.wmnet with OS buster
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21945 and previous config saved to /var/cache/conftool/dbconfig/20220307-110823-marostegui.json
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21944 and previous config saved to /var/cache/conftool/dbconfig/20220307-110330-root.json
* 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21943 and previous config saved to /var/cache/conftool/dbconfig/20220307-110304-ladsgroup.json
* 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1143.eqiad.wmnet with OS bullseye
* 11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1084.eqiad.wmnet with OS buster
* 10:59 vgutierrez: pool cp1084 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
* 10:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21942 and previous config saved to /var/cache/conftool/dbconfig/20220307-104906-marostegui.json
* 10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21941 and previous config saved to /var/cache/conftool/dbconfig/20220307-104826-root.json
* 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21940 and previous config saved to /var/cache/conftool/dbconfig/20220307-104759-ladsgroup.json
* 10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
* 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
* 10:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4036.ulsfo.wmnet with OS buster
* 10:34 jayme: (re)started ferm on kubernetes1001
* 10:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21939 and previous config saved to /var/cache/conftool/dbconfig/20220307-103323-root.json
* 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21938 and previous config saved to /var/cache/conftool/dbconfig/20220307-103253-ladsgroup.json
* 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1143.eqiad.wmnet with OS bullseye
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21937 and previous config saved to /var/cache/conftool/dbconfig/20220307-102737-ladsgroup.json
* 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21936 and previous config saved to /var/cache/conftool/dbconfig/20220307-102730-ladsgroup.json
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P21935 and previous config saved to /var/cache/conftool/dbconfig/20220307-102209-marostegui.json
* 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21934 and previous config saved to /var/cache/conftool/dbconfig/20220307-102158-ladsgroup.json
* 10:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 10:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21933 and previous config saved to /var/cache/conftool/dbconfig/20220307-102129-ladsgroup.json
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21932 and previous config saved to /var/cache/conftool/dbconfig/20220307-102054-marostegui.json
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P21931 and previous config saved to /var/cache/conftool/dbconfig/20220307-101824-marostegui.json
* 10:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1084.eqiad.wmnet with OS buster
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21930 and previous config saved to /var/cache/conftool/dbconfig/20220307-101657-root.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21929 and previous config saved to /var/cache/conftool/dbconfig/20220307-101225-ladsgroup.json
* 10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS buster
* 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21928 and previous config saved to /var/cache/conftool/dbconfig/20220307-100624-ladsgroup.json
* 10:04 vgutierrez: pool cp2036 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21927 and previous config saved to /var/cache/conftool/dbconfig/20220307-100153-root.json
* 10:00 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 09:58 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21926 and previous config saved to /var/cache/conftool/dbconfig/20220307-095720-ladsgroup.json
* 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21925 and previous config saved to /var/cache/conftool/dbconfig/20220307-095120-ladsgroup.json
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21924 and previous config saved to /var/cache/conftool/dbconfig/20220307-095111-root.json
* 09:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21923 and previous config saved to /var/cache/conftool/dbconfig/20220307-094649-root.json
* 09:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
* 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21922 and previous config saved to /var/cache/conftool/dbconfig/20220307-094216-ladsgroup.json
* 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21921 and previous config saved to /var/cache/conftool/dbconfig/20220307-093701-ladsgroup.json
* 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21920 and previous config saved to /var/cache/conftool/dbconfig/20220307-093653-ladsgroup.json
* 09:36 jynus: updated non-A wikipedia.org DNS records [[phab:T302617|T302617]]
* 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21919 and previous config saved to /var/cache/conftool/dbconfig/20220307-093615-ladsgroup.json
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21918 and previous config saved to /var/cache/conftool/dbconfig/20220307-093607-root.json
* 09:35 jynus: updated non-A wikipedia.org DNS records
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21917 and previous config saved to /var/cache/conftool/dbconfig/20220307-093146-root.json
* 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21916 and previous config saved to /var/cache/conftool/dbconfig/20220307-093032-ladsgroup.json
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P21915 and previous config saved to /var/cache/conftool/dbconfig/20220307-093013-marostegui.json
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21914 and previous config saved to /var/cache/conftool/dbconfig/20220307-092924-root.json
* 09:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS buster
* 09:22 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 04s)
* 09:22 ebysans@deploy1002: Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
* 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21913 and previous config saved to /var/cache/conftool/dbconfig/20220307-092148-ladsgroup.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 60%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21912 and previous config saved to /var/cache/conftool/dbconfig/20220307-092103-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T300381|T300381]])', diff saved to https://phabricator.wikimedia.org/P21911 and previous config saved to /var/cache/conftool/dbconfig/20220307-092034-marostegui.json
* 09:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 09:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21910 and previous config saved to /var/cache/conftool/dbconfig/20220307-091527-ladsgroup.json
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21909 and previous config saved to /var/cache/conftool/dbconfig/20220307-091421-root.json
* 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21908 and previous config saved to /var/cache/conftool/dbconfig/20220307-090644-ladsgroup.json
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21907 and previous config saved to /var/cache/conftool/dbconfig/20220307-090600-root.json
* 09:01 dcausse: restarting blazegraph on wdqs1013 (jvm stuck for 6hours)
* 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21906 and previous config saved to /var/cache/conftool/dbconfig/20220307-090021-ladsgroup.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21905 and previous config saved to /var/cache/conftool/dbconfig/20220307-085917-root.json
* 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21904 and previous config saved to /var/cache/conftool/dbconfig/20220307-085139-ladsgroup.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 40%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21903 and previous config saved to /var/cache/conftool/dbconfig/20220307-085056-root.json
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:46 elukey: `kafka configs --alter --entity-type topics --entity-name udp_localhost-info --add-config retention.bytes=300000000000` on kafka-logging to reduce the size of the biggest topic partitions
* 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21902 and previous config saved to /var/cache/conftool/dbconfig/20220307-084641-ladsgroup.json
* 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21901 and previous config saved to /var/cache/conftool/dbconfig/20220307-084516-ladsgroup.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21900 and previous config saved to /var/cache/conftool/dbconfig/20220307-084413-root.json
* 08:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e3f70f699e37a27872b73f6483f6d27c669bb520}}: enwiki: Deploy Growth features to 100% of users ([[phab:T302846|T302846]]) (duration: 00m 50s)
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P21899 and previous config saved to /var/cache/conftool/dbconfig/20220307-084235-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21898 and previous config saved to /var/cache/conftool/dbconfig/20220307-084219-root.json
* 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21897 and previous config saved to /var/cache/conftool/dbconfig/20220307-083948-ladsgroup.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21896 and previous config saved to /var/cache/conftool/dbconfig/20220307-083553-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21895 and previous config saved to /var/cache/conftool/dbconfig/20220307-082716-root.json
* 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21894 and previous config saved to /var/cache/conftool/dbconfig/20220307-082443-ladsgroup.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 20%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21893 and previous config saved to /var/cache/conftool/dbconfig/20220307-082049-root.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21892 and previous config saved to /var/cache/conftool/dbconfig/20220307-081212-root.json
* 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21891 and previous config saved to /var/cache/conftool/dbconfig/20220307-080938-ladsgroup.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21890 and previous config saved to /var/cache/conftool/dbconfig/20220307-080545-root.json
* 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1144.eqiad.wmnet with OS bullseye
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21889 and previous config saved to /var/cache/conftool/dbconfig/20220307-075708-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P21888 and previous config saved to /var/cache/conftool/dbconfig/20220307-075523-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21887 and previous config saved to /var/cache/conftool/dbconfig/20220307-075504-root.json
* 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21886 and previous config saved to /var/cache/conftool/dbconfig/20220307-075433-ladsgroup.json
* 07:53 marostegui: dbmaint on db1181 s7@eqiad [[phab:T276150|T276150]]
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P21885 and previous config saved to /var/cache/conftool/dbconfig/20220307-075120-marostegui.json
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21884 and previous config saved to /var/cache/conftool/dbconfig/20220307-074923-ladsgroup.json
* 07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 07:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21883 and previous config saved to /var/cache/conftool/dbconfig/20220307-074909-ladsgroup.json
* 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
* 07:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21882 and previous config saved to /var/cache/conftool/dbconfig/20220307-074001-root.json
* 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21881 and previous config saved to /var/cache/conftool/dbconfig/20220307-073405-ladsgroup.json
* 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1144.eqiad.wmnet with OS bullseye
* 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21880 and previous config saved to /var/cache/conftool/dbconfig/20220307-072624-ladsgroup.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21879 and previous config saved to /var/cache/conftool/dbconfig/20220307-072457-root.json
* 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21878 and previous config saved to /var/cache/conftool/dbconfig/20220307-072453-ladsgroup.json
* 07:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 07:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21877 and previous config saved to /var/cache/conftool/dbconfig/20220307-071900-ladsgroup.json
* 07:15 elukey: `elukey@ml-staging-ctrl2002:~$ sudo systemctl reset-failed ifup@ens13.service`
* 07:14 elukey: kill tmux sessions of user 'zpapierski' on wdqs[1004,2002,2003] (puppet broken, offboarded user)
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21876 and previous config saved to /var/cache/conftool/dbconfig/20220307-071227-ladsgroup.json
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21875 and previous config saved to /var/cache/conftool/dbconfig/20220307-070953-root.json
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:06 marostegui: dbmaint on db1179 s3@eqiad [[phab:T302222|T302222]]
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P21874 and previous config saved to /var/cache/conftool/dbconfig/20220307-070537-marostegui.json
* 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21873 and previous config saved to /var/cache/conftool/dbconfig/20220307-070355-ladsgroup.json
* 07:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21872 and previous config saved to /var/cache/conftool/dbconfig/20220307-065839-ladsgroup.json
* 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21871 and previous config saved to /var/cache/conftool/dbconfig/20220307-065832-ladsgroup.json
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21870 and previous config saved to /var/cache/conftool/dbconfig/20220307-065722-ladsgroup.json
* 06:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:52 urbanecm: Reset authentication throttle for 217.23.37.10 via resetAuthenticationThrottle.php ([[phab:T302973|T302973]])
* 06:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|2e9fdd4}}: {{Gerrit|867bb7b}}: Add throttle rules ([[phab:T302973|T302973]]; [[phab:T303002|T303002]]) (duration: 00m 49s)
* 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21869 and previous config saved to /var/cache/conftool/dbconfig/20220307-064327-ladsgroup.json
* 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21868 and previous config saved to /var/cache/conftool/dbconfig/20220307-064217-ladsgroup.json
* 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21867 and previous config saved to /var/cache/conftool/dbconfig/20220307-062823-ladsgroup.json
* 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21866 and previous config saved to /var/cache/conftool/dbconfig/20220307-062713-ladsgroup.json
* 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21865 and previous config saved to /var/cache/conftool/dbconfig/20220307-061318-ladsgroup.json
* 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21864 and previous config saved to /var/cache/conftool/dbconfig/20220307-060819-ladsgroup.json
* 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21863 and previous config saved to /var/cache/conftool/dbconfig/20220307-060811-ladsgroup.json
* 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21862 and previous config saved to /var/cache/conftool/dbconfig/20220307-055307-ladsgroup.json
* 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1147.eqiad.wmnet with OS bullseye
* 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21861 and previous config saved to /var/cache/conftool/dbconfig/20220307-053802-ladsgroup.json
* 05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
* 05:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
* 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21860 and previous config saved to /var/cache/conftool/dbconfig/20220307-052257-ladsgroup.json
* 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1147.eqiad.wmnet with OS bullseye
* 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21859 and previous config saved to /var/cache/conftool/dbconfig/20220307-051807-ladsgroup.json
* 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21858 and previous config saved to /var/cache/conftool/dbconfig/20220307-051537-ladsgroup.json
* 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
 
== 2022-03-04 ==
* 17:59 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:57 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 17:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:48 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 17:46 mforns@deploy1002: Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 07s)
* 17:46 mforns@deploy1002: Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
* 17:39 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@19520c1]: (no justification provided) (duration: 00m 08s)
* 17:39 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@19520c1]: (no justification provided)
* 17:09 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 08s)
* 17:09 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 16:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
* 16:35 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 16:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 10s)
* 16:13 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
* 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21856 and previous config saved to /var/cache/conftool/dbconfig/20220304-160629-ladsgroup.json
* 16:03 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 03s)
* 16:03 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 15:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1086.eqiad.wmnet with OS buster
* 15:58 vgutierrez: pool cp1086 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS buster
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21854 and previous config saved to /var/cache/conftool/dbconfig/20220304-155124-ladsgroup.json
* 15:51 vgutierrez: pool cp2038 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:49 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
* 15:49 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
* 15:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
* 15:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
* 15:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
* 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21852 and previous config saved to /var/cache/conftool/dbconfig/20220304-153619-ladsgroup.json
* 15:34 XioNoX: blackhole IPs - [[phab:T303055|T303055]]
* 15:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
* 15:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1086.eqiad.wmnet with OS buster
* 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21851 and previous config saved to /var/cache/conftool/dbconfig/20220304-152114-ladsgroup.json
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21850 and previous config saved to /var/cache/conftool/dbconfig/20220304-152007-ladsgroup.json
* 15:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
* 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
* 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
* 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21849 and previous config saved to /var/cache/conftool/dbconfig/20220304-151937-ladsgroup.json
* 15:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS buster
* 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21848 and previous config saved to /var/cache/conftool/dbconfig/20220304-150433-ladsgroup.json
* 14:59 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad.service on elastic1049 to resolve CirrusSearchJVMGCOldPoolFlatlined alert
* 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21847 and previous config saved to /var/cache/conftool/dbconfig/20220304-144926-ladsgroup.json
* 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3059.esams.wmnet with OS buster
* 14:43 vgutierrez: pool cp3059 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21846 and previous config saved to /var/cache/conftool/dbconfig/20220304-143421-ladsgroup.json
* 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21845 and previous config saved to /var/cache/conftool/dbconfig/20220304-143214-ladsgroup.json
* 14:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
* 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21844 and previous config saved to /var/cache/conftool/dbconfig/20220304-143206-ladsgroup.json
* 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21842 and previous config saved to /var/cache/conftool/dbconfig/20220304-141701-ladsgroup.json
* 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21841 and previous config saved to /var/cache/conftool/dbconfig/20220304-140156-ladsgroup.json
* 13:49 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1302-1306].eqiad.wmnet
* 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21840 and previous config saved to /var/cache/conftool/dbconfig/20220304-134651-ladsgroup.json
* 13:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21839 and previous config saved to /var/cache/conftool/dbconfig/20220304-134443-ladsgroup.json
* 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
* 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21838 and previous config saved to /var/cache/conftool/dbconfig/20220304-134436-ladsgroup.json
* 13:38 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21837 and previous config saved to /var/cache/conftool/dbconfig/20220304-132931-ladsgroup.json
* 13:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1302-1306].eqiad.wmnet
* 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21836 and previous config saved to /var/cache/conftool/dbconfig/20220304-131426-ladsgroup.json
* 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21835 and previous config saved to /var/cache/conftool/dbconfig/20220304-125921-ladsgroup.json
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21834 and previous config saved to /var/cache/conftool/dbconfig/20220304-125714-ladsgroup.json
* 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21833 and previous config saved to /var/cache/conftool/dbconfig/20220304-125706-ladsgroup.json
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21832 and previous config saved to /var/cache/conftool/dbconfig/20220304-124201-ladsgroup.json
* 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21831 and previous config saved to /var/cache/conftool/dbconfig/20220304-122656-ladsgroup.json
* 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21830 and previous config saved to /var/cache/conftool/dbconfig/20220304-121152-ladsgroup.json
* 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21829 and previous config saved to /var/cache/conftool/dbconfig/20220304-120944-ladsgroup.json
* 12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
* 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21828 and previous config saved to /var/cache/conftool/dbconfig/20220304-120937-ladsgroup.json
* 12:04 jbond: enable SameSite=Strict on idp
* 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21827 and previous config saved to /var/cache/conftool/dbconfig/20220304-115432-ladsgroup.json
* 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21826 and previous config saved to /var/cache/conftool/dbconfig/20220304-113927-ladsgroup.json
* 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21825 and previous config saved to /var/cache/conftool/dbconfig/20220304-112422-ladsgroup.json
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21824 and previous config saved to /var/cache/conftool/dbconfig/20220304-112214-ladsgroup.json
* 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
* 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21823 and previous config saved to /var/cache/conftool/dbconfig/20220304-112207-ladsgroup.json
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
* 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4024.ulsfo.wmnet with OS buster
* 11:09 vgutierrez: pool cp4024 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21822 and previous config saved to /var/cache/conftool/dbconfig/20220304-110702-ladsgroup.json
* 10:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
* 10:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
* 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21821 and previous config saved to /var/cache/conftool/dbconfig/20220304-105157-ladsgroup.json
* 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3059.esams.wmnet with OS buster
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4024.ulsfo.wmnet with OS buster
* 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21820 and previous config saved to /var/cache/conftool/dbconfig/20220304-103652-ladsgroup.json
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21819 and previous config saved to /var/cache/conftool/dbconfig/20220304-103444-ladsgroup.json
* 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21818 and previous config saved to /var/cache/conftool/dbconfig/20220304-103437-ladsgroup.json
* 10:29 vgutierrez: pool cp5004 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5004.eqsin.wmnet with OS buster
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21817 and previous config saved to /var/cache/conftool/dbconfig/20220304-101932-ladsgroup.json
* 10:08 aqu@deploy1002: Finished deploy [airflow-dags/analytics@1c8384f]: AF //tion default args (duration: 00m 07s)
* 10:08 aqu@deploy1002: Started deploy [airflow-dags/analytics@1c8384f]: AF //tion default args
* 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21816 and previous config saved to /var/cache/conftool/dbconfig/20220304-100427-ladsgroup.json
* 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T300992|T300992]])', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20220304-094918-ladsgroup.json
* 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21815 and previous config saved to /var/cache/conftool/dbconfig/20220304-094710-ladsgroup.json
* 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21814 and previous config saved to /var/cache/conftool/dbconfig/20220304-094702-ladsgroup.json
* 09:43 vgutierrez: restart varnish on cp3056
* 09:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
* 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
* 09:37 vgutierrez: restart varnish on cp3058
* 09:33 vgutierrez: restart varnish on cp3060
* 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21813 and previous config saved to /var/cache/conftool/dbconfig/20220304-093157-ladsgroup.json
* 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21812 and previous config saved to /var/cache/conftool/dbconfig/20220304-091652-ladsgroup.json
* 09:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5004.eqsin.wmnet with OS buster
* 09:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[1005-1006].eqiad.wmnet
* 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21811 and previous config saved to /var/cache/conftool/dbconfig/20220304-090147-ladsgroup.json
* 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21810 and previous config saved to /var/cache/conftool/dbconfig/20220304-085939-ladsgroup.json
* 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21809 and previous config saved to /var/cache/conftool/dbconfig/20220304-085932-ladsgroup.json
* 08:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21808 and previous config saved to /var/cache/conftool/dbconfig/20220304-084427-ladsgroup.json
* 08:34 akosiaris: [[phab:T303027|T303027]] depool mw130[2-6]. Old jobrunners/videoscalers, being decommisioned
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=mw130[2-6].eqiad.wmnet
* 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21807 and previous config saved to /var/cache/conftool/dbconfig/20220304-082922-ladsgroup.json
* 08:23 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[1005-1006].eqiad.wmnet
* 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21806 and previous config saved to /var/cache/conftool/dbconfig/20220304-081417-ladsgroup.json
* 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21805 and previous config saved to /var/cache/conftool/dbconfig/20220304-081210-ladsgroup.json
* 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
* 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:27 XioNoX: push pfw policies - [[phab:T303003|T303003]]
* 01:35 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
* 01:34 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
* 01:34 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 01:33 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 01:33 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
* 01:32 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
* 01:32 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
* 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
* 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
* 01:30 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 01:30 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 01:29 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 01:29 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 01:27 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 01:27 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 01:25 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 01:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 01:23 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
* 01:23 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
* 01:22 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
 
== 2022-03-03 ==
* 21:35 brennen: end of UTC late backport & config window / training
* 21:30 brennen@deploy1002: Finished scap: Config: [[gerrit:766229{{!}}Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956)]] (duration: 01m 33s)
* 21:28 brennen@deploy1002: Started scap: Config: [[gerrit:766229{{!}}Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956)]]
* 21:28 brennen@deploy1002: Synchronized multiversion/MWRealm.php: Config: [[gerrit:766229{{!}}Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956)]] (duration: 00m 48s)
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:35 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]]
* 19:32 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): no current blockers; proceeding to all wikis
* 19:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: Backport: [[gerrit:767812{{!}}Unset data-toc in SkinVector (T302461)]] (duration: 00m 49s)
* 19:23 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/MinervaNeue/resources/skins.minerva.base.styles/userMenu.less: Backport: [[gerrit:767811{{!}}Remove user navigation min width and width (T302753)]] (duration: 00m 51s)
* 19:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:50 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:39 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:32 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:29 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:11 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: updating wmf-puppet-dashboard (duration: 09m 12s)
* 18:02 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 18:02 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: updating wmf-puppet-dashboard
* 17:59 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 17:58 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|Idf7b21159423}} (duration: 00m 51s)
* 17:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 17:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 17:48 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 17:47 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21802 and previous config saved to /var/cache/conftool/dbconfig/20220303-173630-ladsgroup.json
* 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21801 and previous config saved to /var/cache/conftool/dbconfig/20220303-172125-ladsgroup.json
* 17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21800 and previous config saved to /var/cache/conftool/dbconfig/20220303-170621-ladsgroup.json
* 17:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 17:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 16:53 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 16:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21799 and previous config saved to /var/cache/conftool/dbconfig/20220303-165116-ladsgroup.json
* 16:30 godog: roll-restart logstash to pick up config changes - [[phab:T291946|T291946]]
* 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1148.eqiad.wmnet with OS bullseye
* 16:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1148.eqiad.wmnet with reason: host reimage
* 15:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1148.eqiad.wmnet with reason: host reimage
* 15:53 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:47 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1148.eqiad.wmnet with OS bullseye
* 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21798 and previous config saved to /var/cache/conftool/dbconfig/20220303-152242-ladsgroup.json
* 15:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 15:21 moritzm: restarting FPM/Apache on mw  job runners to pick up expat security updates
* 15:08 mutante: [[phab:T296022|T296022]] - phabricator - disabled git cloning over ssh for 'stewardscripts' repo - stewards have been asked via mailing list
* 14:48 godog: force a puppet run on cp6011 to unblock icinga and disable puppet again, cc bblack
* 14:48 Lucas_WMDE: UTC afternoon backport window done
* 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport: [[gerrit:767690{{!}}GLAM event: Update landing page content (T301097)]] (full sync because of i18n change) (duration: 09m 45s)
* 14:37 lucaswerkmeister-wmde@deploy1002: Started scap: Backport: [[gerrit:767690{{!}}GLAM event: Update landing page content (T301097)]] (full sync because of i18n change)
* 14:26 XioNoX: merge Icinga: use parent switch shortname
* 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 14:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 14:04 volans: upgraded spicerack to v2.1.0 on cumin1001/cumin2002
* 14:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
* 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21794 and previous config saved to /var/cache/conftool/dbconfig/20220303-135737-ladsgroup.json
* 13:54 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:54 akosiaris: switch changeprop, changeprop-jobqueue to use rdb1011. [[phab:T281217|T281217]]
* 13:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 13:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
* 13:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
* 13:53 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
* 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
* 13:52 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:52 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 13:52 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 13:51 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 13:45 akosiaris: roll restart ores uwsgi and celery for rdb1005 decommissioning.  [[phab:T281217|T281217]]
* 13:44 akosiaris@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
* 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21793 and previous config saved to /var/cache/conftool/dbconfig/20220303-134232-ladsgroup.json
* 13:20 moritzm: restarting FPM/Apache on mw app servers to pick up expat security updates
* 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21791 and previous config saved to /var/cache/conftool/dbconfig/20220303-131223-ladsgroup.json
* 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1149.eqiad.wmnet with OS bullseye
* 12:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1149.eqiad.wmnet with reason: host reimage
* 12:47 hashar: Upgrading Quibble on CI Jenkins jobs from 1.3.0 to 1.4.3 https://gerrit.wikimedia.org/r/c/integration/config/+/767749/
* 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1149.eqiad.wmnet with reason: host reimage
* 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1149.eqiad.wmnet with OS bullseye
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21790 and previous config saved to /var/cache/conftool/dbconfig/20220303-123030-ladsgroup.json
* 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 11:49 volans: uploaded spicerack_2.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21789 and previous config saved to /var/cache/conftool/dbconfig/20220303-113304-kormat.json
* 11:18 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21788 and previous config saved to /var/cache/conftool/dbconfig/20220303-111801-kormat.json
* 11:02 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21787 and previous config saved to /var/cache/conftool/dbconfig/20220303-110257-kormat.json
* 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21786 and previous config saved to /var/cache/conftool/dbconfig/20220303-110224-ladsgroup.json
* 11:02 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling db1126 to full weight', diff saved to https://phabricator.wikimedia.org/P21785 and previous config saved to /var/cache/conftool/dbconfig/20220303-110220-kormat.json
* 10:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: [[gerrit:767692{{!}}rdbms: Change getConnectionRef to return with getLazyConnectionRef (T255493)]] (duration: 00m 50s)
* 10:50 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.24/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: [[gerrit:767691{{!}}rdbms: Change getConnectionRef to return with getLazyConnectionRef (T255493)]] (duration: 00m 51s)
* 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21784 and previous config saved to /var/cache/conftool/dbconfig/20220303-104713-ladsgroup.json
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21783 and previous config saved to /var/cache/conftool/dbconfig/20220303-103659-ladsgroup.json
* 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21782 and previous config saved to /var/cache/conftool/dbconfig/20220303-103209-ladsgroup.json
* 10:30 XioNoX: repool ulsfo
* 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21781 and previous config saved to /var/cache/conftool/dbconfig/20220303-102154-ladsgroup.json
* 10:18 elukey: kubectl cordon kubernetes200[1-4] to avoid scheduling pods on nodes that will be decommed during the next weeks - [[phab:T302208|T302208]]
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21780 and previous config saved to /var/cache/conftool/dbconfig/20220303-101704-ladsgroup.json
* 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1160.eqiad.wmnet with OS bullseye
* 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21779 and previous config saved to /var/cache/conftool/dbconfig/20220303-100649-ladsgroup.json
* 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
* 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21778 and previous config saved to /var/cache/conftool/dbconfig/20220303-095145-ladsgroup.json
* 09:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
* 09:37 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@1c8384f]: AF //tion default args (duration: 00m 09s)
* 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1160.eqiad.wmnet with OS bullseye
* 09:37 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@1c8384f]: AF //tion default args
* 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21777 and previous config saved to /var/cache/conftool/dbconfig/20220303-093306-ladsgroup.json
* 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2073 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21775 and previous config saved to /var/cache/conftool/dbconfig/20220303-091340-ladsgroup.json
* 09:12 moritzm: restarting FPM/Apache on mw API servers to pick up expat security updates
* 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2073.codfw.wmnet with OS bullseye
* 09:01 moritzm: restarting superset on an-tool1010 to pick up expat security updates
* 08:52 taavi: UTC morning deploys done
* 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21774 and previous config saved to /var/cache/conftool/dbconfig/20220303-085125-ladsgroup.json
* 08:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21773 and previous config saved to /var/cache/conftool/dbconfig/20220303-085118-ladsgroup.json
* 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2073.codfw.wmnet with reason: host reimage
* 08:48 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:766869{{!}}GLAM event: Update wgGECampaigns and wgGECampaignTopics (T301029)]] (duration: 00m 51s)
* 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2073.codfw.wmnet with reason: host reimage
* 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21772 and previous config saved to /var/cache/conftool/dbconfig/20220303-083613-ladsgroup.json
* 08:34 moritzm: installing expat security updates
* 08:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2073.codfw.wmnet with OS bullseye
* 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2073 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21771 and previous config saved to /var/cache/conftool/dbconfig/20220303-082842-ladsgroup.json
* 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance
* 08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance
* 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2090 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21770 and previous config saved to /var/cache/conftool/dbconfig/20220303-082656-ladsgroup.json
* 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21768 and previous config saved to /var/cache/conftool/dbconfig/20220303-082108-ladsgroup.json
* 08:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4034.ulsfo.wmnet with OS buster
* 08:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:766306{{!}}Add centralauth-suppress to steward and wmf-supportsafety at metawiki (T302675)]] (duration: 00m 50s)
* 08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2090.codfw.wmnet with OS bullseye
* 08:13 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:767683{{!}}fawiki: Remove the Book namespace (T302957)]] (duration: 00m 51s)
* 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21767 and previous config saved to /var/cache/conftool/dbconfig/20220303-080603-ladsgroup.json
* 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: host reimage
* 07:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: host reimage
* 07:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4034.ulsfo.wmnet with reason: host reimage
* 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21766 and previous config saved to /var/cache/conftool/dbconfig/20220303-075534-ladsgroup.json
* 07:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 07:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4034.ulsfo.wmnet with reason: host reimage
* 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2090.codfw.wmnet with OS bullseye
* 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2090 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21765 and previous config saved to /var/cache/conftool/dbconfig/20220303-074209-ladsgroup.json
* 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance
* 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance
* 07:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21764 and previous config saved to /var/cache/conftool/dbconfig/20220303-073920-ladsgroup.json
* 07:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
* 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21763 and previous config saved to /var/cache/conftool/dbconfig/20220303-072415-ladsgroup.json
* 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21762 and previous config saved to /var/cache/conftool/dbconfig/20220303-071800-ladsgroup.json
* 07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2106.codfw.wmnet with OS bullseye
* 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21761 and previous config saved to /var/cache/conftool/dbconfig/20220303-070910-ladsgroup.json
* 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2106.codfw.wmnet with reason: host reimage
* 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21760 and previous config saved to /var/cache/conftool/dbconfig/20220303-065405-ladsgroup.json
* 06:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2106.codfw.wmnet with reason: host reimage
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21759 and previous config saved to /var/cache/conftool/dbconfig/20220303-064945-ladsgroup.json
* 06:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 06:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21758 and previous config saved to /var/cache/conftool/dbconfig/20220303-064937-ladsgroup.json
* 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2106.codfw.wmnet with OS bullseye
* 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21757 and previous config saved to /var/cache/conftool/dbconfig/20220303-063514-ladsgroup.json
* 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
* 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
* 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21756 and previous config saved to /var/cache/conftool/dbconfig/20220303-063433-ladsgroup.json
* 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21755 and previous config saved to /var/cache/conftool/dbconfig/20220303-063350-ladsgroup.json
* 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2119.codfw.wmnet with OS bullseye
* 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21754 and previous config saved to /var/cache/conftool/dbconfig/20220303-061928-ladsgroup.json
* 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2119.codfw.wmnet with reason: host reimage
* 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2119.codfw.wmnet with reason: host reimage
* 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21753 and previous config saved to /var/cache/conftool/dbconfig/20220303-060423-ladsgroup.json
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21752 and previous config saved to /var/cache/conftool/dbconfig/20220303-060006-ladsgroup.json
* 06:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21751 and previous config saved to /var/cache/conftool/dbconfig/20220303-055959-ladsgroup.json
* 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2119.codfw.wmnet with OS bullseye
* 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21750 and previous config saved to /var/cache/conftool/dbconfig/20220303-054657-ladsgroup.json
* 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
* 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
* 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21749 and previous config saved to /var/cache/conftool/dbconfig/20220303-054454-ladsgroup.json
* 05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21748 and previous config saved to /var/cache/conftool/dbconfig/20220303-053324-ladsgroup.json
* 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21747 and previous config saved to /var/cache/conftool/dbconfig/20220303-052949-ladsgroup.json
* 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2136.codfw.wmnet with OS bullseye
* 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21746 and previous config saved to /var/cache/conftool/dbconfig/20220303-051444-ladsgroup.json
* 04:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
* 04:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
* 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21745 and previous config saved to /var/cache/conftool/dbconfig/20220303-044933-ladsgroup.json
* 04:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21744 and previous config saved to /var/cache/conftool/dbconfig/20220303-044926-ladsgroup.json
* 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2136.codfw.wmnet with OS bullseye
* 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21743 and previous config saved to /var/cache/conftool/dbconfig/20220303-043942-ladsgroup.json
* 04:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
* 04:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
* 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21742 and previous config saved to /var/cache/conftool/dbconfig/20220303-043759-ladsgroup.json
* 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21741 and previous config saved to /var/cache/conftool/dbconfig/20220303-043421-ladsgroup.json
* 04:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2140.codfw.wmnet with OS bullseye
* 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21740 and previous config saved to /var/cache/conftool/dbconfig/20220303-041916-ladsgroup.json
* 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
* 04:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
* 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21739 and previous config saved to /var/cache/conftool/dbconfig/20220303-040412-ladsgroup.json
* 03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21738 and previous config saved to /var/cache/conftool/dbconfig/20220303-035954-ladsgroup.json
* 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2140.codfw.wmnet with OS bullseye
* 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21737 and previous config saved to /var/cache/conftool/dbconfig/20220303-035328-ladsgroup.json
* 03:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
* 03:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
* 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
* 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
* 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21736 and previous config saved to /var/cache/conftool/dbconfig/20220303-035134-ladsgroup.json
* 03:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2147.codfw.wmnet with OS bullseye
* 03:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21735 and previous config saved to /var/cache/conftool/dbconfig/20220303-033628-ladsgroup.json
* 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
* 03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
* 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21734 and previous config saved to /var/cache/conftool/dbconfig/20220303-032123-ladsgroup.json
* 03:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2147.codfw.wmnet with OS bullseye
* 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21733 and previous config saved to /var/cache/conftool/dbconfig/20220303-030618-ladsgroup.json
* 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 ([[phab:T302950|T302950]])', diff saved to https://phabricator.wikimedia.org/P21732 and previous config saved to /var/cache/conftool/dbconfig/20220303-030518-ladsgroup.json
* 03:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
* 03:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
* 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21731 and previous config saved to /var/cache/conftool/dbconfig/20220303-025500-ladsgroup.json
* 02:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 02:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 01:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
* 01:42 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
* 00:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dumpsdata1007.eqiad.wmnet
* 00:31 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:25 robh@cumin1001: START - Cookbook sre.dns.netbox
* 00:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts dumpsdata1007.eqiad.wmnet


== 2022-03-02 ==
== 2022-05-02 ==
* 23:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye
* 23:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS bullseye
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
* 23:25 ryankemper: [[phab:T276198|T276198]] Re-enabled puppet across fleet: `ryankemper@cumin1001:~$ sudo -E cumin 'R:Elasticsearch::instance' 'enable-puppet "deploy fix from [[phab:T276198|T276198]]"'`
* 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
* 23:21 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 22:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage
* 23:21 ryankemper: [[phab:T276198|T276198]] https://gerrit.wikimedia.org/r/c/operations/puppet/+/767600 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/767603/ fixed all the problems. Re-enabling puppet on elastic*, cloudelastic*, and relforge* shortly
* 22:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage
* 23:15 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host krb2002.codfw.wmnet with OS bullseye
* 23:08 robh
* 22:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS bullseye
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:49 catrope@deploy1002: Finished scap: Backport: [[gerrit:788338{{!}}[TOC] Remove pointer-events:none on .sidebar-toc-link (T307271)]] and [[gerrit:788336{{!}}Video landing page: Show different title/body text on mobile (T303785)]] (duration: 11m 45s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:46 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 20:44 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation:


== 2022-03-01 ==
== 2022-05-01 ==
* 22:51 inflatador: [[phab:T276198|T276198]] reenabled puppet on elastic1052.eqiad.wmnet
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P27195 and previous config saved to /var/cache/conftool/dbconfig/20220501-235742-ladsgroup.json
* 22:37 inflatador: [[phab:T276198|T276198]] rebooting elastic1052.eqiad.wmnet to test failure condition
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27194 and previous config saved to /var/cache/conftool/dbconfig/20220501-235700-ladsgroup.json
* 22:33 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
* 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27193 and previous config saved to /var/cache/conftool/dbconfig/20220501-235549-ladsgroup.json
* 22:33 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
* 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 22:32 inflatador: [[phab:T276198|T276198]] disabling puppet on elastic1052.eqiad.wmnet to test failure condition (rebooting shortly)
* 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
* 21:53 dancy@deploy1002: Finished scap: Resync to try to clear alerts (duration: 12m 08s)
* 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 21:41 dancy@deploy1002: Started scap: Resync to try to clear alerts
* 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 21:36 dancy@deploy1002: Started scap: Resync to try to clear alerts
* 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 20:36 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]]
* 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 20:33 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): no current blockers; proceeding to group0; note this may briefly trigger some version alerts
* 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 20:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/includes: Backport: [[gerrit:767089{{!}}Revert "preferences: Use a faster and simpler form descriptor when validating" (T302643)]] (duration: 00m 55s)
* 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 20:05 mutante: alert1001 - re-enabled puppet
* 23:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 20:05 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]] (duration: 53m 17s)
* 23:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 19:45 mutante: alert1001 - disable puppet, systemctl stop ircecho - to stop bot spam, caused somehow by new scap version breaking "mw versions mismwatch" alerting - affects labtestwiki,testwiki,testwikidatawiki
* 23:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27192 and previous config saved to /var/cache/conftool/dbconfig/20220501-235443-ladsgroup.json
* 19:38 mutante: mw1449 - scap pull
* 23:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27191 and previous config saved to /var/cache/conftool/dbconfig/20220501-234237-ladsgroup.json
* 19:36 mutante: mw1414 - scap pull
* 23:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27190 and previous config saved to /var/cache/conftool/dbconfig/20220501-233938-ladsgroup.json
* 19:11 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.24  refs [[phab:T300200|T300200]]
* 23:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27189 and previous config saved to /var/cache/conftool/dbconfig/20220501-232732-ladsgroup.json
* 19:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2008.codfw.wmnet
* 23:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27188 and previous config saved to /var/cache/conftool/dbconfig/20220501-232433-ladsgroup.json
* 19:01 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P27187 and previous config saved to /var/cache/conftool/dbconfig/20220501-231227-ladsgroup.json
* 18:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27186 and previous config saved to /var/cache/conftool/dbconfig/20220501-230928-ladsgroup.json
* 18:57 brennen: 1.38.0-wmf.24 train ([[phab:T300200|T300200]]): there's currently a single blocker at [[phab:T302643|T302643]]; staging to testwikis and holding there until backport's available
* 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27185 and previous config saved to /var/cache/conftool/dbconfig/20220501-230715-ladsgroup.json
* 18:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2008.codfw.wmnet
* 23:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 18:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
* 23:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 18:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
* 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27184 and previous config saved to /var/cache/conftool/dbconfig/20220501-230707-ladsgroup.json
* 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21626 and previous config saved to /var/cache/conftool/dbconfig/20220301-180216-ladsgroup.json
* 22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P27183 and previous config saved to /var/cache/conftool/dbconfig/20220501-225626-ladsgroup.json
* 17:52 cwhite: completed grafana upgrade in eqiad [[phab:T282863|T282863]]
* 22:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 17:50 herron: re-enabling puppet and ircecho on alert1001
* 22:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 17:47 cwhite: upgrade grafana in eqiad [[phab:T282863|T282863]]
* 22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27182 and previous config saved to /var/cache/conftool/dbconfig/20220501-225202-ladsgroup.json
* 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21625 and previous config saved to /var/cache/conftool/dbconfig/20220301-174711-ladsgroup.json
* 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P27181 and previous config saved to /var/cache/conftool/dbconfig/20220501-223920-ladsgroup.json
* 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27180 and previous config saved to /var/cache/conftool/dbconfig/20220501-223657-ladsgroup.json
* 17:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27179 and previous config saved to /var/cache/conftool/dbconfig/20220501-222415-ladsgroup.json
* 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21624 and previous config saved to /var/cache/conftool/dbconfig/20220301-173206-ladsgroup.json
* 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27178 and previous config saved to /var/cache/conftool/dbconfig/20220501-222152-ladsgroup.json
* 17:24 dancy@deploy1002: Finished scap: testing container image build (duration: 28m 39s)
* 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27177 and previous config saved to /var/cache/conftool/dbconfig/20220501-221938-ladsgroup.json
* 17:17 herron: stopped ircecho on alert1001 due to systemd unit alert shower
* 22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21622 and previous config saved to /var/cache/conftool/dbconfig/20220301-171701-ladsgroup.json
* 22:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T300992|T300992]])', diff saved to https://phabricator.wikimedia.org/P21621 and previous config saved to /var/cache/conftool/dbconfig/20220301-171441-ladsgroup.json
* 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27176 and previous config saved to /var/cache/conftool/dbconfig/20220501-221930-ladsgroup.json
* 17:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27175 and previous config saved to /var/cache/conftool/dbconfig/20220501-220910-ladsgroup.json
* 17:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27174 and previous config saved to /var/cache/conftool/dbconfig/20220501-220425-ladsgroup.json
* 16:55 dancy@deploy1002: Started scap: testing container image build
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P27173 and previous config saved to /var/cache/conftool/dbconfig/20220501-215405-ladsgroup.json
* 16:24 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 03s)
* 21:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27172 and previous config saved to /var/cache/conftool/dbconfig/20220501-214920-ladsgroup.json
* 16:23 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P27171 and previous config saved to /var/cache/conftool/dbconfig/20220501-213750-ladsgroup.json
* 16:12 moritzm: restarting apache on logstash nodes to pick up expat update
* 21:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 16:11 elukey@deploy1002: Finished deploy [ores/deploy@29de1cc]: ORES Winter deployment - [[phab:T300195|T300195]] (duration: 36m 13s)
* 21:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 16:05 moritzm: restarting nginx on wcqs* nodes to pick up expat update
* 21:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:35 elukey@deploy1002: Started deploy [ores/deploy@29de1cc]: ORES Winter deployment - [[phab:T300195|T300195]]
* 21:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 15:21 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 07s)
* 21:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27170 and previous config saved to /var/cache/conftool/dbconfig/20220501-213415-ladsgroup.json
* 15:21 ntsako@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
* 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27169 and previous config saved to /var/cache/conftool/dbconfig/20220501-213203-ladsgroup.json
* 15:06 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2003.codfw.wmnet
* 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 14:52 elukey: elukey@deploy1002:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the node)
* 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27168 and previous config saved to /var/cache/conftool/dbconfig/20220501-213155-ladsgroup.json
* 14:51 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27167 and previous config saved to /var/cache/conftool/dbconfig/20220501-211650-ladsgroup.json
* 14:51 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
* 21:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 14:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2002.codfw.wmnet
* 21:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27166 and previous config saved to /var/cache/conftool/dbconfig/20220501-210145-ladsgroup.json
* 14:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 20:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 14:38 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 14:36 vgutierrez: pool cp1087 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 20:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 14:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS buster
* 20:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 14:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27165 and previous config saved to /var/cache/conftool/dbconfig/20220501-204640-ladsgroup.json
* 14:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27164 and previous config saved to /var/cache/conftool/dbconfig/20220501-204427-ladsgroup.json
* 14:32 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
* 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:32 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 14:28 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2001.codfw.wmnet
* 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:19 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 14:19 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:51 cwhite: reload nginx on conf1004/1005 to pick up cert changes
* 14:15 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 06:37 cwhite: restart etcdmirror-conftool.eqiad.wmnet on conf2005
* 14:14 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
* 05:44 mutante: puppetmaster1001 - sudo puppet cert clean etcd.eqiad.wmnet (expired)
* 14:09 moritzm: restarting nginx on wdqs* nodes to pick up expat update
* 02:32 Krinkle: repool mw1340
* 14:03 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
* 14:03 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:57 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:53 mmandere: restart purged on cp60[15-16]
* 13:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
* 13:48 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 13:48 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
* 13:48 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
* 13:48 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
* 13:44 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
* 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:43 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 13:43 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:40 kormat: Deploying wmfmariadbpy 0.9 [[phab:T302796|T302796]]
* 13:40 kormat: uploaded wmfmariadbpy 0.9 to apt.wm.o [[phab:T302796|T302796]]
* 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 13:39 klausman@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
* 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
* 13:32 moritzm: restarting nginx on registry* nodes to pick up expat update
* 13:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster
* 13:15 XioNoX: restart cr1-drmrs for software upgrade
* 13:03 moritzm: restarting FPM/Apache on parsoid hosts to pick up expat update
* 12:50 vgutierrez: pool cp3062 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 12:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster
* 12:39 moritzm: installing expat security updates
* 12:34 mmandere: restart purged on cp60[12-14]
* 12:32 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker (duration: 01m 06s)
* 12:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker
* 12:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s)
* 12:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker
* 12:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s)
* 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration
* 12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s)
* 12:09 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration
* 11:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:36 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 11:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
* 11:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet
* 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 11:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 11:27 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 11:27 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:21 _joe_: restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.[[phab:T244843|T244843]]
* 11:18 _joe_: also removed the ipvsadm entry for apaches:80 [[phab:T244843|T244843]]
* 11:17 jayme: rolled back linkrecommendation staging helm release to revision 12 - [[phab:T302744|T302744]]
* 11:17 _joe_: restarting pybal on lvs1020 [[phab:T244843|T244843]]
* 11:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
* 11:11 _joe_: restarted pybal on lvs2009, [[phab:T244843|T244843]]
* 11:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
* 11:07 _joe_: restarted pybal on lvs2010, [[phab:T244843|T244843]]
* 11:02 mmandere: restart purged on cp60[09,10,11]
* 11:00 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
* 10:47 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
* 10:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts
* 10:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts
* 10:39 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts
* 10:31 mmandere: restart purged on cp600[6-8]
* 10:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 10:05 vgutierrez: pool cp2039 running HAProxy as TLS termination layer - [[phab:T290005|T290005]] [[phab:T271421|T271421]]
* 09:48 elukey: elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)
* 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster
* 09:33 _joe_: restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted
* 09:31 _joe_: restart pybal on lvs1020
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts
* 09:25 elukey: restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka)
* 09:25 _joe_: manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore
* 09:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts
* 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts
* 09:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts
* 09:22 _joe_: restarted pybal on lvs2009, the mw api is now effectively https-only in codfw [[phab:T287820|T287820]]
* 09:20 _joe_: restarted pybal on lvs2010
* 09:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
* 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
* 09:06 elukey: restart purged on cp6005
* 08:57 elukey: restart purged on cp6004
* 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster
* 08:27 urbanecm: UTC morning B&C window done
* 08:25 elukey: restart purged on cp6003
* 08:16 moritzm: drain instances off ganeti2008 for eventual decom
* 08:08 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: {{Gerrit|d149208dfd7e5fbf51f44dd0bf7dae3b2e2f5159}}: Use service-proxy to connect to linkrecommendation ([[phab:T302719|T302719]]) (duration: 00m 49s)
* 07:59 elukey: restart purged on cp6002
* 06:58 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): [[phab:T302464|T302464]] test (duration: 00m 17s)
* 06:57 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): [[phab:T302464|T302464]] test
* 06:56 elukey: restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to)
* 06:46 _joe_: uploaded scap 4.4.1 to <nowiki>{</nowiki>stretch,buster,bullseye<nowiki>}</nowiki> [[phab:T302464|T302464]]
* 06:46 _joe_: uploaded scap 4.4.1 to <nowiki>{</nowiki>stretch,buster,bullseye<nowiki>}</nowiki>
* 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json
* 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json
* 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json
* 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json
* 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 ([[phab:T302185|T302185]])', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json
* 01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
* 00:17 mutante: 15.wikipedia.org on k8s (staging) deploy1002:~] $ curl -s --resolve "15.wikipedia.org:4111:staging.svc.eqiad.wmnet" 'https://15.wikipedia.org' {{!}} grep grandpa  =>  "&ldquo;Wikipedia is like an all-knowing grandpa.&rdquo;" {{!}} [[phab:T300171|T300171]]


==Archives==
==Archives==

Revision as of 01:06, 21 May 2022

2022-05-21

  • 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298555)', diff saved to https://phabricator.wikimedia.org/P28208 and previous config saved to /var/cache/conftool/dbconfig/20220521-010640-ladsgroup.json
  • 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298555)', diff saved to https://phabricator.wikimedia.org/P28207 and previous config saved to /var/cache/conftool/dbconfig/20220521-010626-ladsgroup.json
  • 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298555)', diff saved to https://phabricator.wikimedia.org/P28206 and previous config saved to /var/cache/conftool/dbconfig/20220521-001014-ladsgroup.json
  • 00:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 00:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance

2022-05-20

  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28205 and previous config saved to /var/cache/conftool/dbconfig/20220520-224558-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28204 and previous config saved to /var/cache/conftool/dbconfig/20220520-223054-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28203 and previous config saved to /var/cache/conftool/dbconfig/20220520-221550-ladsgroup.json
  • 22:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye
  • 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28202 and previous config saved to /var/cache/conftool/dbconfig/20220520-220046-ladsgroup.json
  • 21:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 21:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298555)', diff saved to https://phabricator.wikimedia.org/P28201 and previous config saved to /var/cache/conftool/dbconfig/20220520-215514-ladsgroup.json
  • 21:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 21:50 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 21:38 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye
  • 21:37 mutante: correction: mistake was to use FQDN T307142
  • 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError T307142
  • 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError
  • 21:34 mutante: reimaging gitlab1004 (insetup) to test partman recipe from gerrit:793534 - T307142
  • 21:34 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
  • 21:33 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298555)', diff saved to https://phabricator.wikimedia.org/P28198 and previous config saved to /var/cache/conftool/dbconfig/20220520-190633-ladsgroup.json
  • 19:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 19:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 18:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:55 mutante: [mwmaint1002:~] $ sudo mwscript initSiteStats.php --wiki=kcgwiki --update (to update statistics for latest wikipedia kcg) T305281
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 17:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5003.eqsin.wmnet with OS bullseye
  • 17:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 17:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
  • 16:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:37 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti5003.eqsin.wmnet with OS bullseye
  • 16:33 robh: troubleshooting ganeti5003 ipmi failure via T308211
  • 16:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:09 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 16:08 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
  • 16:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2069.codfw.wmnet with OS bullseye
  • 15:58 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
  • 15:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2068.codfw.wmnet with OS bullseye
  • 15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
  • 15:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
  • 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2069.codfw.wmnet with OS bullseye
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye
  • 15:17 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2068.codfw.wmnet with OS bullseye
  • 15:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1118 T', diff saved to https://phabricator.wikimedia.org/P28196 and previous config saved to /var/cache/conftool/dbconfig/20220520-151407-ladsgroup.json
  • 15:11 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28195 and previous config saved to /var/cache/conftool/dbconfig/20220520-150838-root.json
  • 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28194 and previous config saved to /var/cache/conftool/dbconfig/20220520-145334-root.json
  • 14:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2066.codfw.wmnet with OS bullseye
  • 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 10 hosts with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 10 hosts with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298555)', diff saved to https://phabricator.wikimedia.org/P28193 and previous config saved to /var/cache/conftool/dbconfig/20220520-144212-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T298565)', diff saved to https://phabricator.wikimedia.org/P28192 and previous config saved to /var/cache/conftool/dbconfig/20220520-144111-ladsgroup.json
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28191 and previous config saved to /var/cache/conftool/dbconfig/20220520-143830-root.json
  • 14:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 14:28 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28190 and previous config saved to /var/cache/conftool/dbconfig/20220520-142327-root.json
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T303603)', diff saved to https://phabricator.wikimedia.org/P28189 and previous config saved to /var/cache/conftool/dbconfig/20220520-142032-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T303603)', diff saved to https://phabricator.wikimedia.org/P28188 and previous config saved to /var/cache/conftool/dbconfig/20220520-141316-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T303603)', diff saved to https://phabricator.wikimedia.org/P28187 and previous config saved to /var/cache/conftool/dbconfig/20220520-141308-ladsgroup.json
  • 14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2066.codfw.wmnet with OS bullseye
  • 14:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28186 and previous config saved to /var/cache/conftool/dbconfig/20220520-140823-root.json
  • 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T303603)', diff saved to https://phabricator.wikimedia.org/P28185 and previous config saved to /var/cache/conftool/dbconfig/20220520-135350-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28184 and previous config saved to /var/cache/conftool/dbconfig/20220520-135319-root.json
  • 13:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T298565)', diff saved to https://phabricator.wikimedia.org/P28183 and previous config saved to /var/cache/conftool/dbconfig/20220520-134515-ladsgroup.json
  • 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 13:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
  • 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 1%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28182 and previous config saved to /var/cache/conftool/dbconfig/20220520-133815-root.json
  • 13:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: T308459
  • 13:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: T308459
  • 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-tls
  • 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=varnish-fe
  • 13:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-be
  • 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T303603)', diff saved to https://phabricator.wikimedia.org/P28181 and previous config saved to /var/cache/conftool/dbconfig/20220520-132307-ladsgroup.json
  • 13:15 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye
  • 12:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye
  • 12:42 mforns@deploy1002: Finished deploy [airflow-dags/analytics@51a203f]: (no justification provided) (duration: 00m 07s)
  • 12:42 mforns@deploy1002: Started deploy [airflow-dags/analytics@51a203f]: (no justification provided)
  • 12:37 moritzm: copy prometheus-mcrouter-exporter from buster-wikimedia to bullseye-wikimedia (needed for T308214)
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T303603)', diff saved to https://phabricator.wikimedia.org/P28180 and previous config saved to /var/cache/conftool/dbconfig/20220520-123045-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T303603)', diff saved to https://phabricator.wikimedia.org/P28179 and previous config saved to /var/cache/conftool/dbconfig/20220520-123037-ladsgroup.json
  • 12:23 Amir1: killed refreshlinks suggestion in 10160
  • 12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298555)', diff saved to https://phabricator.wikimedia.org/P28178 and previous config saved to /var/cache/conftool/dbconfig/20220520-121116-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:10 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
  • 11:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298555)', diff saved to https://phabricator.wikimedia.org/P28177 and previous config saved to /var/cache/conftool/dbconfig/20220520-114234-ladsgroup.json
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T303603)', diff saved to https://phabricator.wikimedia.org/P28176 and previous config saved to /var/cache/conftool/dbconfig/20220520-114202-ladsgroup.json
  • 11:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T303603)', diff saved to https://phabricator.wikimedia.org/P28175 and previous config saved to /var/cache/conftool/dbconfig/20220520-113207-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T303603)', diff saved to https://phabricator.wikimedia.org/P28174 and previous config saved to /var/cache/conftool/dbconfig/20220520-112449-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T298555)', diff saved to https://phabricator.wikimedia.org/P28173 and previous config saved to /var/cache/conftool/dbconfig/20220520-111239-ladsgroup.json
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 8:00:00 on 8 hosts with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 8:00:00 on 8 hosts with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:09 jynus: drop backupcheck users from m1>dbbackups
  • 10:54 moritzm: uploaded cas 6.4.6.3-wmf11u1 to apt.wikimedia.org/bullseye
  • 10:52 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
  • 10:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
  • 10:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert read new on frwiki for templatelinks migration (duration: 00m 51s)
  • 10:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2063.codfw.wmnet with OS bullseye
  • 09:39 volans@cumin1001: dbctl commit (dc=all): 'emergency depool', diff saved to https://phabricator.wikimedia.org/P28172 and previous config saved to /var/cache/conftool/dbconfig/20220520-093928-volans.json
  • 09:34 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage
  • 09:33 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage
  • 09:17 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2063.codfw.wmnet with OS bullseye
  • 08:54 vgutierrez: re-enabling puppet and repooling cp3060 - T308797 T243167
  • 08:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2062.codfw.wmnet with OS bullseye
  • 08:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage
  • 08:09 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P28171 and previous config saved to /var/cache/conftool/dbconfig/20220520-080719-root.json
  • 07:53 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2062.codfw.wmnet with OS bullseye
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P28170 and previous config saved to /var/cache/conftool/dbconfig/20220520-075215-root.json
  • 07:52 jayme: imported kubeconform 0.4.13-1 to buster-,bullseye-wikimedia - T306165
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P28169 and previous config saved to /var/cache/conftool/dbconfig/20220520-073712-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P28168 and previous config saved to /var/cache/conftool/dbconfig/20220520-072208-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P28167 and previous config saved to /var/cache/conftool/dbconfig/20220520-070704-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P28166 and previous config saved to /var/cache/conftool/dbconfig/20220520-065200-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P28164 and previous config saved to /var/cache/conftool/dbconfig/20220520-063656-root.json
  • 06:03 moritzm: racadm racreset on ganeti5003
  • 05:09 marostegui: dbmaint s1@eqiad T298554
  • 01:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298560)', diff saved to https://phabricator.wikimedia.org/P28162 and previous config saved to /var/cache/conftool/dbconfig/20220520-010743-ladsgroup.json
  • 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P28161 and previous config saved to /var/cache/conftool/dbconfig/20220520-005237-ladsgroup.json
  • 00:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon1003.wikimedia.org with OS bullseye
  • 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P28160 and previous config saved to /var/cache/conftool/dbconfig/20220520-003732-ladsgroup.json
  • 00:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
  • 00:29 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
  • 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bullseye
  • 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298560)', diff saved to https://phabricator.wikimedia.org/P28159 and previous config saved to /var/cache/conftool/dbconfig/20220520-002227-ladsgroup.json

2022-05-19

  • 23:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host netmon1003.wikimedia.org with OS bullseye
  • 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bullseye
  • 22:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:07 robh: cp3060 idrac interface frozen, rebooted via power outlet control on T243167
  • 20:49 thcipriani: UTC late deploys done
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:40 bking@deploy1002: Synchronized static/images/project-logos: Config: zhwikiversity: Optimize logo per commons files (T308620) (duration: 00m 51s)
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:34 bking@deploy1002: Synchronized logos/config.yaml: Config: zhwikiversity: Declare commons files for logo and its variant (T308620) (duration: 00m 50s)
  • 20:33 bking@deploy1002: Synchronized wmf-config/logos.php: Config: zhwikiversity: Declare commons files for logo and its variant (T308620) (duration: 00m 53s)
  • 20:24 bking@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: bnwikivoyage: Set $wgRelatedArticlesUseCirrusSearch to true on bnwikivoyage (T307904) (duration: 00m 50s)
  • 20:21 robh: ganeti5003 updating firmware via T308211
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:59 damilare: payments-wiki from 464e3b0e to 592c6d34
  • 19:58 inflatador: bking@relforge1004: banned relforge1003 from main and alpha clusters in preparation for reimage T308770
  • 19:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:05 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:01 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:49 ryankemper: [WDQS Deploy] `Unknown` status resolved following deploy of https://gerrit.wikimedia.org/r/793530 ; wdqs categories monitoring is healthy again. We're done here
  • 18:45 ryankemper: [WDQS Deploy] Deployed https://gerrit.wikimedia.org/r/793530; ran puppet agent across wdqs* and just kicked off a re-check of the NRPE alerts. We'll see if that clears the Unknown state up
  • 18:29 ryankemper: [WDQS Deploy] Okay, so a recent refactor changed where the `check_categories.py` lives. Previously it was `/usr/lib/nagios/plugins/check_categories.py` and now it's `/usr/local/lib/nagios/plugins/check_categories.py`. So https://gerrit.wikimedia.org/r/793530 should fix things now
  • 18:18 ryankemper: [WDQS Deploy] Traced the failure back to https://gerrit.wikimedia.org/r/c/operations/puppet/+/792700 presumably; trying to see what we can do to fix up the patch without having to revert it since it touches stuff besides query service
  • 17:55 ryankemper: [WDQS Deploy] Slight amendment to the above; we're seeing status `Unknown` for `Categories endpoint` and `Categories update lag`. They've been warning for ~24h so it didn't surface following the deploy, but looking into that now
  • 17:51 ryankemper: T306899 Rolled `wdqs` and `wcqs` deploys to adjust logging settings. Hoping this gives us more visibility on the 500 errors WCQS users have been experiencing.
  • 17:50 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 17:30 ryankemper: [WCQS Deploy] Successful test query placed on commons-query.wikimedia.org, there's no relevant criticals in Icinga, and Grafana looks good. WCQS deploy complete
  • 17:30 ryankemper: [WCQS Deploy] Restarted `wcqs-updater` across all hosts: `sudo -E cumin 'A:wcqs-public' 'systemctl restart wcqs-updater'`
  • 17:29 ryankemper: [WCQS Deploy] Tests looked good following deploy of `0.3.111` to canary `wcqs1002.eqiad.wmnet`; proceeded to rest of fleet
  • 17:29 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@a493d7f] (wcqs): Deploy 0.3.111 to WCQS (duration: 03m 03s)
  • 17:26 ryankemper@deploy1002: Started deploy [wdqs/wdqs@a493d7f] (wcqs): Deploy 0.3.111 to WCQS
  • 17:26 ryankemper: [WCQS Deploy] Gearing up for deploy of wcqs `0.3.111`
  • 17:24 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 17:24 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 17:23 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 17:22 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@a493d7f]: 0.3.111 (duration: 08m 11s)
  • 17:16 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.111` on canary `wdqs1003`; proceeding to rest of fleet
  • 17:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@a493d7f]: 0.3.111
  • 17:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.111`. Pre-deploy tests passing on canary `wdqs1003`
  • 17:03 otto@deploy1002: Finished deploy [airflow-dags/analytics@95c1f50]: (no justification provided) (duration: 00m 21s)
  • 17:03 otto@deploy1002: Started deploy [airflow-dags/analytics@95c1f50]: (no justification provided)
  • 16:56 otto@deploy1002: Finished deploy [airflow-dags/analytics_test@95c1f50]: (no justification provided) (duration: 00m 12s)
  • 16:55 otto@deploy1002: Started deploy [airflow-dags/analytics_test@95c1f50]: (no justification provided)
  • 16:37 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 16:35 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 16:31 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 16:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298560)', diff saved to https://phabricator.wikimedia.org/P28155 and previous config saved to /var/cache/conftool/dbconfig/20220519-161022-ladsgroup.json
  • 16:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298560)', diff saved to https://phabricator.wikimedia.org/P28154 and previous config saved to /var/cache/conftool/dbconfig/20220519-161014-ladsgroup.json
  • 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 15:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P28153 and previous config saved to /var/cache/conftool/dbconfig/20220519-155509-ladsgroup.json
  • 15:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gerrit2002.wikimedia.org with OS bullseye
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T303603)', diff saved to https://phabricator.wikimedia.org/P28152 and previous config saved to /var/cache/conftool/dbconfig/20220519-154124-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P28151 and previous config saved to /var/cache/conftool/dbconfig/20220519-154003-ladsgroup.json
  • 15:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
  • 15:28 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P28150 and previous config saved to /var/cache/conftool/dbconfig/20220519-152618-ladsgroup.json
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298560)', diff saved to https://phabricator.wikimedia.org/P28149 and previous config saved to /var/cache/conftool/dbconfig/20220519-152457-ladsgroup.json
  • 15:24 ariel@deploy1002: Finished deploy [dumps/dumps@cd30939]: use dbgroupdefault for most jobs (duration: 00m 04s)
  • 15:24 ariel@deploy1002: Started deploy [dumps/dumps@cd30939]: use dbgroupdefault for most jobs
  • 15:23 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti5003.eqsin.wmnet with reason: Remove from cluster for firmware update and eventual reimage
  • 15:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti5003.eqsin.wmnet with reason: Remove from cluster for firmware update and eventual reimage
  • 15:19 oblivian@deploy1002: Synchronized README: null sync-file to verify the switch to the deployment group (duration: 00m 50s)
  • 15:14 _joe_: deploy1002:/srv/mediawiki-staging $ find . -group wikidev -print0 | sudo xargs -0 -n 100 chgrp -h deployment --
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P28148 and previous config saved to /var/cache/conftool/dbconfig/20220519-151113-ladsgroup.json
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
  • 15:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:00 _joe_: oblivian@deploy2002:/srv/mediawiki-staging $ sudo find . -group wikidev -exec chgrp wikidev "{}" \;
  • 15:00 papaul: powerdown gerrit2002 for relocation
  • 14:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T303603)', diff saved to https://phabricator.wikimedia.org/P28147 and previous config saved to /var/cache/conftool/dbconfig/20220519-145608-ladsgroup.json
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T303603)', diff saved to https://phabricator.wikimedia.org/P28145 and previous config saved to /var/cache/conftool/dbconfig/20220519-144021-ladsgroup.json
  • 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T303603)', diff saved to https://phabricator.wikimedia.org/P28144 and previous config saved to /var/cache/conftool/dbconfig/20220519-144013-ladsgroup.json
  • 14:36 tgr: EU mid-day deploys done
  • 14:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrothExperiments: Enable Add Link frontend on tier 3 wikis (T304542) (duration: 00m 50s)
  • 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 14:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P28143 and previous config saved to /var/cache/conftool/dbconfig/20220519-142507-ladsgroup.json
  • 14:23 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:22 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
  • 14:20 tgr@deploy1002: Synchronized static/images/project-logos: Config: zhwikiquote: Optimize logo per commons files (T308620) (duration: 00m 50s)
  • 14:18 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T298557)', diff saved to https://phabricator.wikimedia.org/P28142 and previous config saved to /var/cache/conftool/dbconfig/20220519-141453-marostegui.json
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P28141 and previous config saved to /var/cache/conftool/dbconfig/20220519-141001-ladsgroup.json
  • 14:09 jayme: systemctl restart rsyslog on kubernetes1011,kubestage1003
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
  • 13:58 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: votewiki: Change wgLanguageCode to zh for May 2022 zhwiki admin election (T308397) (duration: 00m 52s)
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T298557)', diff saved to https://phabricator.wikimedia.org/P28140 and previous config saved to /var/cache/conftool/dbconfig/20220519-135632-marostegui.json
  • 13:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298557)', diff saved to https://phabricator.wikimedia.org/P28139 and previous config saved to /var/cache/conftool/dbconfig/20220519-135624-marostegui.json
  • 13:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T303603)', diff saved to https://phabricator.wikimedia.org/P28138 and previous config saved to /var/cache/conftool/dbconfig/20220519-135456-ladsgroup.json
  • 13:52 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.12 refs T305218
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P28137 and previous config saved to /var/cache/conftool/dbconfig/20220519-134119-marostegui.json
  • 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 13:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P28136 and previous config saved to /var/cache/conftool/dbconfig/20220519-132614-marostegui.json
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:21 jnuche@deploy1002: Synchronized php-1.39.0-wmf.12/extensions/FileImporter/src/Services/WikiRevisionFactory.php: Backport: Revert "Fix bogus user object creation in WikiRevisionFactory" (T308691) (duration: 00m 53s)
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298557)', diff saved to https://phabricator.wikimedia.org/P28135 and previous config saved to /var/cache/conftool/dbconfig/20220519-131108-marostegui.json
  • 13:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T303603)', diff saved to https://phabricator.wikimedia.org/P28134 and previous config saved to /var/cache/conftool/dbconfig/20220519-125442-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T303603)', diff saved to https://phabricator.wikimedia.org/P28133 and previous config saved to /var/cache/conftool/dbconfig/20220519-125434-ladsgroup.json
  • 12:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T298557)', diff saved to https://phabricator.wikimedia.org/P28131 and previous config saved to /var/cache/conftool/dbconfig/20220519-124456-marostegui.json
  • 12:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
  • 12:40 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5002.eqsin.wmnet to ganeti01.svc.eqsin.wmnet
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P28130 and previous config saved to /var/cache/conftool/dbconfig/20220519-123927-ladsgroup.json
  • 12:39 root@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5002.eqsin.wmnet to ganeti01.svc.eqsin.wmnet
  • 12:37 marostegui: dbmaint s1@eqiad T300775
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298560)', diff saved to https://phabricator.wikimedia.org/P28129 and previous config saved to /var/cache/conftool/dbconfig/20220519-123227-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T298560)', diff saved to https://phabricator.wikimedia.org/P28128 and previous config saved to /var/cache/conftool/dbconfig/20220519-123219-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P28127 and previous config saved to /var/cache/conftool/dbconfig/20220519-122422-ladsgroup.json
  • 12:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
  • 12:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P28126 and previous config saved to /var/cache/conftool/dbconfig/20220519-121714-ladsgroup.json
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T303603)', diff saved to https://phabricator.wikimedia.org/P28125 and previous config saved to /var/cache/conftool/dbconfig/20220519-120917-ladsgroup.json
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
  • 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
  • 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298557)', diff saved to https://phabricator.wikimedia.org/P28124 and previous config saved to /var/cache/conftool/dbconfig/20220519-120521-marostegui.json
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P28123 and previous config saved to /var/cache/conftool/dbconfig/20220519-120209-ladsgroup.json
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
  • 11:59 marostegui: Failover m5 master T307673
  • 11:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T303603)', diff saved to https://phabricator.wikimedia.org/P28122 and previous config saved to /var/cache/conftool/dbconfig/20220519-115303-ladsgroup.json
  • 11:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 11:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T303603)', diff saved to https://phabricator.wikimedia.org/P28121 and previous config saved to /var/cache/conftool/dbconfig/20220519-115255-ladsgroup.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P28120 and previous config saved to /var/cache/conftool/dbconfig/20220519-115016-marostegui.json
  • 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T298560)', diff saved to https://phabricator.wikimedia.org/P28119 and previous config saved to /var/cache/conftool/dbconfig/20220519-114703-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P28118 and previous config saved to /var/cache/conftool/dbconfig/20220519-113750-ladsgroup.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P28117 and previous config saved to /var/cache/conftool/dbconfig/20220519-113511-marostegui.json
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 11:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P28116 and previous config saved to /var/cache/conftool/dbconfig/20220519-112245-ladsgroup.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298557)', diff saved to https://phabricator.wikimedia.org/P28115 and previous config saved to /var/cache/conftool/dbconfig/20220519-112006-marostegui.json
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T303603)', diff saved to https://phabricator.wikimedia.org/P28114 and previous config saved to /var/cache/conftool/dbconfig/20220519-110740-ladsgroup.json
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T298557)', diff saved to https://phabricator.wikimedia.org/P28113 and previous config saved to /var/cache/conftool/dbconfig/20220519-105637-marostegui.json
  • 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (