You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye)
imported>Stashbot
(ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance)
Line 1: Line 1:
== 2022-05-04 ==
* 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 00:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27368 and previous config saved to /var/cache/conftool/dbconfig/20220504-004954-ladsgroup.json
* 00:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5002.eqsin.wmnet,service=ats-tls
* 00:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5002.eqsin.wmnet,service=varnish-fe
* 00:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5002.eqsin.wmnet,service=ats-be
* 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27367 and previous config saved to /var/cache/conftool/dbconfig/20220504-003449-ladsgroup.json
* 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27366 and previous config saved to /var/cache/conftool/dbconfig/20220504-001944-ladsgroup.json
* 00:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27365 and previous config saved to /var/cache/conftool/dbconfig/20220504-001326-ladsgroup.json
* 00:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 00:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: [[phab:T307525|T307525]]', diff saved to https://phabricator.wikimedia.org/P27364 and previous config saved to /var/cache/conftool/dbconfig/20220504-001205-ladsgroup.json
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
== 2022-05-03 ==
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: [[phab:T307525|T307525]]', diff saved to https://phabricator.wikimedia.org/P27363 and previous config saved to /var/cache/conftool/dbconfig/20220503-235701-ladsgroup.json
* 23:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 23:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 23:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 23:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27362 and previous config saved to /var/cache/conftool/dbconfig/20220503-234451-ladsgroup.json
* 23:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 23:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:23 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:788869{{!}}Revert "cirrus: Move query traffic to codfw for maintenance" (T306811)]] (duration: 00m 56s)
* 23:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:58 mutante: added image-suggestion to kube_services.certs.yaml in private repo, generated new certs and git committed them [[phab:T304891|T304891]]
* 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:57 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:788822{{!}}cirrus: Move query traffic to codfw for maintenance (T306811)]] (duration: 00m 49s)
* 22:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:55 ebernhardson@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:788820{{!}}Revert "translate: Move ttmserver queries to codfw" (T306811)]] (duration: 00m 50s)
* 22:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 22:46 brennen: train 1.39.0-wmf.10 ([[phab:T305216|T305216]]): [[phab:T307513|T307513]] doesn't seem quite resolved - parking the train at testwikis until european morning
* 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:43 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:788818{{!}}Set mediawikiwiki to READ NEW for templatelinks migration (T306673)]] (duration: 00m 50s)
* 22:42 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.9/extensions/TitleBlacklist: Backport: [[gerrit:788816{{!}}wmf.9 HACK: add forward class alias for TitleBlacklist (T307513)]] (duration: 00m 50s)
* 22:41 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 22:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:36 mutante: ns0: authdns-update - deploying DNS change,add new svc and discovery records for image-suggestion [[phab:T304891|T304891]]
* 22:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 22:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 22:17 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.9"
* 22:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 22:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 22:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.10  refs [[phab:T305216|T305216]]
* 22:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:02 brennen@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/TitleBlacklist: Backport: [[gerrit:788745{{!}}Add class alias for TitleBlacklist and bump cache version (T307513)]] (duration: 00m 50s)
* 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 21:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 21:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 21:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 21:12 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.9"
* 21:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.10  refs [[phab:T305216|T305216]]
* 21:02 brennen: train 1.39.0-wmf.10 ([[phab:T305216|T305216]]): no current blockers, proceeding to group0
* 21:01 cjming: end of UTC late backport & config window
* 20:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 20:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 20:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 20:53 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:785208{{!}}Fix: Enable '$wgCopyUploadsDomains' to viwiki (T303577)]] (duration: 00m 50s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 20:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 20:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 20:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:787749{{!}}Image Suggestions Feedback Stream]] (duration: 00m 50s)
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 ebernhardson: start ttmserver-export.php from Translate against eqiad search cluster for [[phab:T306811|T306811]]
* 20:17 cjming@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:788773{{!}}translate: Move ttmserver queries to codfw (T306811)]] (duration: 00m 50s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:16 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1033.eqiad.wmnet
* 20:16 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2034.codfw.wmnet
* 20:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:788754{{!}}Re-enable disabled Special pages on medium wikis (T48094)]] (duration: 00m 55s)
* 20:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2034.codfw.wmnet
* 20:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1033.eqiad.wmnet
* 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27361 and previous config saved to /var/cache/conftool/dbconfig/20220503-200634-ladsgroup.json
* 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27360 and previous config saved to /var/cache/conftool/dbconfig/20220503-195129-ladsgroup.json
* 19:49 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host logstash1032.eqiad.wmnet
* 19:48 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2033.codfw.wmnet
* 19:40 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2033.codfw.wmnet
* 19:40 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1032.eqiad.wmnet
* 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27359 and previous config saved to /var/cache/conftool/dbconfig/20220503-193624-ladsgroup.json
* 19:35 herron@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host logstash2031.codfw.wmnet
* 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27358 and previous config saved to /var/cache/conftool/dbconfig/20220503-192119-ladsgroup.json
* 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27357 and previous config saved to /var/cache/conftool/dbconfig/20220503-191909-ladsgroup.json
* 19:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 19:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
* 19:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 19:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: [[phab:T306560|T306560]]', diff saved to https://phabricator.wikimedia.org/P27356 and previous config saved to /var/cache/conftool/dbconfig/20220503-183901-ladsgroup.json
* 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: [[phab:T306560|T306560]]', diff saved to https://phabricator.wikimedia.org/P27355 and previous config saved to /var/cache/conftool/dbconfig/20220503-182357-ladsgroup.json
* 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T306560|T306560]])', diff saved to https://phabricator.wikimedia.org/P27354 and previous config saved to /var/cache/conftool/dbconfig/20220503-181457-ladsgroup.json
* 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* afk: train 1.39.0-wmf.10 ([[phab:T305216|T305216]]):  amending prior logline: planning to move to _group0_ on return
* 18:04 ebernhardson: start ttmserver-export.php from Translate against codfw search cluster for [[phab:T306811|T306811]]
* 18:04 brennen: train 1.39.0-wmf.10 ([[phab:T305216|T305216]]):  train is still blocked on [[phab:T307019|T307019]], although in practice that blocker doesn't prevent us from going ahead safely.  i'm going unavoidably afk for a couple of hours; plan to move train to group1 on my return.
* 18:03 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1031.eqiad.wmnet
* 17:57 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2031.codfw.wmnet
* 17:57 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1031.eqiad.wmnet
* 17:57 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1030.eqiad.wmnet
* 17:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2030.codfw.wmnet
* 17:50 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2030.codfw.wmnet
* 17:49 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1030.eqiad.wmnet
* 17:32 mutante: removing geoip-database from all install hosts with [cumin2002:~] $ sudo cumin 'install*' 'apt-get remove geoip-database'
* 17:30 mutante: install2003 - apt-get remove geoip-databases
* 17:25 mutante: [webperf1002:~] $ sudo systemctl start arclamp_compress_logs (was failed with https://ms-fe.svc.eqiad.wmnet/... returning 503) but worked fine when manually starting it
* 17:25 mutante: [webperf1002:~] $ sudo systemctl status arclamp_compress_logs
* 17:20 mutante: install1003 - apt-get remove geoip-database libgeoip1  and running puppet. I don't see why these are installed here
* 15:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1041.eqiad.wmnet with OS bullseye
* 15:42 sukhe: enable puppet on A:dns-rec and A:wikidough
* 15:10 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2029.codfw.wmnet
* 15:09 sukhe: disable puppet on A:wikidough to deploy CR 779936
* 15:08 sukhe: disable puppet on A:dns-rec to deploy CR 779936
* 15:05 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2029.codfw.wmnet
* 15:01 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1041.eqiad.wmnet with reason: host reimage
* 14:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1041.eqiad.wmnet with reason: host reimage
* 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
* 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2012.codfw.wmnet: Restarting for cert refresh - hnowlan@cumin1001
* 14:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1029.eqiad.wmnet
* 14:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1041.eqiad.wmnet with OS bullseye
* 14:46 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be1041.eqiad.wmnet with OS bullseye
* 14:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
* 14:42 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1029.eqiad.wmnet
* 14:40 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2012.codfw.wmnet: Restarting for cert refresh - hnowlan@cumin1001
* 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2026.codfw.wmnet with reason: reboot
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2026.codfw.wmnet with reason: reboot
* 14:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase101[6-8].eqiad.wmnet: Restarting for cert refresh - hnowlan@cumin1001
* 14:27 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1041.eqiad.wmnet with OS bullseye
* 14:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2043.codfw.wmnet with OS bullseye
* 14:18 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1028.eqiad.wmnet
* 14:11 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1028.eqiad.wmnet
* 14:07 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase101[6-8].eqiad.wmnet: Restarting for cert refresh - hnowlan@cumin1001
* 14:03 vgutierrez: upgrade haproxy to 2.4.16 on cp3050 - [[phab:T307444|T307444]]
* 14:02 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2028.codfw.wmnet
* 13:56 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2028.codfw.wmnet
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1027.eqiad.wmnet
* 13:54 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2027.codfw.wmnet
* 13:47 moritzm: stopped/maske coal/navtiming on webperf1001/webperf2001 [[phab:T305460|T305460]]
* 13:46 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2027.codfw.wmnet
* 13:45 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1027.eqiad.wmnet
* 13:45 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1002.eqiad.wmnet with OS bullseye
* 13:40 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2043.codfw.wmnet with reason: host reimage
* 13:36 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2043.codfw.wmnet with reason: host reimage
* 13:28 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1002.eqiad.wmnet with reason: host reimage
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 Lucas_WMDE: UTC afternoon backport window done
* 13:25 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1002.eqiad.wmnet with reason: host reimage
* 13:22 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2043.codfw.wmnet with OS bullseye
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:788356{{!}}Use "unexpectedUnconnectedPage" page prop everywhere]] (duration: 00m 51s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747196{{!}}Remove unused maps event stream (T293366)]] (duration: 01m 04s)
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
* 13:08 jynus@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1002.eqiad.wmnet with OS bullseye
* 13:08 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS bullseye
* 13:08 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
* 13:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2026.codfw.wmnet
* 13:03 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1026.eqiad.wmnet
* 12:58 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2026.codfw.wmnet
* 12:58 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1026.eqiad.wmnet
* 12:57 vgutierrez: rolling downgrade of HAProxy to version 2.4.15 on upload - [[phab:T307444|T307444]]
* 10:58 vgutierrez: rolling downgrade of HAProxy to version 2.4.15 on text - [[phab:T307444|T307444]]
* 10:57 jbond: restrict ports allowed via squid
* 10:46 vgutierrez: downgrade haproxy 2.4 package to version 2.4.15 on apt.wm.o (buster-wikimedia)
* 09:47 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb2001-dev.codfw.wmnet
* 09:40 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb2001-dev.codfw.wmnet
* 09:38 gehel: resetting BMC on relforge1003 and relforge1004 - https://wikitech.wikimedia.org/wiki/Management_Interfaces#From_local_IPMI
* 09:32 vgutierrez: rolling upgrade of HAProxy in eqiad
* 09:14 marostegui: Disable puppet on clouddb1013 clouddb1016 clouddb1020 [[phab:T305974|T305974]]
* 09:14 marostegui: Disable puppet on clouddb1013 clouddb1016 clouddb1020T305974
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:51 hashar@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.10  refs [[phab:T305216|T305216]] (duration: 30m 44s)
* 08:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2001.codfw.wmnet with OS buster
* 08:44 vgutierrez: rolling upgrade of HAProxy in esams
* 08:29 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1002.eqiad.wmnet
* 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:24 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1002.eqiad.wmnet
* 08:21 hashar@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.10  refs [[phab:T305216|T305216]]
* 08:14 hashar: Starting MediaWiki train deployment using `scap stage-train 1.39.0-wmf.10` # [[phab:T305216|T305216]]
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1132 [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P27350 and previous config saved to /var/cache/conftool/dbconfig/20220503-080421-marostegui.json
* 08:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2001.codfw.wmnet with reason: host reimage
* 07:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2001.codfw.wmnet with reason: host reimage
* 07:33 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores2001.codfw.wmnet with OS buster
* 07:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 taavi: UTC morning deploys done
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nikki Nikkhoui out of all services on: 1224 hosts
* 07:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nikki Nikkhoui out of all services on: 1224 hosts
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nikki Nikkhoui out of all services on: 513 hosts
* 07:18 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nikki Nikkhoui out of all services on: 513 hosts
* 07:15 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:787689{{!}}Enable the versioned mapdata API (T307110)]] (duration: 00m 48s)
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
== 2022-05-02 ==
== 2022-05-02 ==
* 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye
* 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye

Revision as of 00:50, 4 May 2022

2022-05-04

  • 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27368 and previous config saved to /var/cache/conftool/dbconfig/20220504-004954-ladsgroup.json
  • 00:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5002.eqsin.wmnet,service=ats-tls
  • 00:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5002.eqsin.wmnet,service=varnish-fe
  • 00:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5002.eqsin.wmnet,service=ats-be
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27367 and previous config saved to /var/cache/conftool/dbconfig/20220504-003449-ladsgroup.json
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T307525)', diff saved to https://phabricator.wikimedia.org/P27366 and previous config saved to /var/cache/conftool/dbconfig/20220504-001944-ladsgroup.json
  • 00:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T307525)', diff saved to https://phabricator.wikimedia.org/P27365 and previous config saved to /var/cache/conftool/dbconfig/20220504-001326-ladsgroup.json
  • 00:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 00:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 00:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: T307525', diff saved to https://phabricator.wikimedia.org/P27364 and previous config saved to /var/cache/conftool/dbconfig/20220504-001205-ladsgroup.json
  • 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-05-03

  • 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: T307525', diff saved to https://phabricator.wikimedia.org/P27363 and previous config saved to /var/cache/conftool/dbconfig/20220503-235701-ladsgroup.json
  • 23:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 23:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 23:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 23:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T307525)', diff saved to https://phabricator.wikimedia.org/P27362 and previous config saved to /var/cache/conftool/dbconfig/20220503-234451-ladsgroup.json
  • 23:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 23:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:23 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "cirrus: Move query traffic to codfw for maintenance" (T306811) (duration: 00m 56s)
  • 23:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:58 mutante: added image-suggestion to kube_services.certs.yaml in private repo, generated new certs and git committed them T304891
  • 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:57 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: cirrus: Move query traffic to codfw for maintenance (T306811) (duration: 00m 49s)
  • 22:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:55 ebernhardson@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "translate: Move ttmserver queries to codfw" (T306811) (duration: 00m 50s)
  • 22:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 22:46 brennen: train 1.39.0-wmf.10 (T305216): T307513 doesn't seem quite resolved - parking the train at testwikis until european morning
  • 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:43 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set mediawikiwiki to READ NEW for templatelinks migration (T306673) (duration: 00m 50s)
  • 22:42 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.9/extensions/TitleBlacklist: Backport: wmf.9 HACK: add forward class alias for TitleBlacklist (T307513) (duration: 00m 50s)
  • 22:41 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 22:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:36 mutante: ns0: authdns-update - deploying DNS change,add new svc and discovery records for image-suggestion T304891
  • 22:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 22:17 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.9"
  • 22:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 22:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 22:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.10 refs T305216
  • 22:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:02 brennen@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/TitleBlacklist: Backport: Add class alias for TitleBlacklist and bump cache version (T307513) (duration: 00m 50s)
  • 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 21:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 21:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 21:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 21:12 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.9"
  • 21:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.10 refs T305216
  • 21:02 brennen: train 1.39.0-wmf.10 (T305216): no current blockers, proceeding to group0
  • 21:01 cjming: end of UTC late backport & config window
  • 20:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 20:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 20:53 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix: Enable '$wgCopyUploadsDomains' to viwiki (T303577) (duration: 00m 50s)
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 20:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 20:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:25 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Image Suggestions Feedback Stream (duration: 00m 50s)
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:20 ebernhardson: start ttmserver-export.php from Translate against eqiad search cluster for T306811
  • 20:17 cjming@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: translate: Move ttmserver queries to codfw (T306811) (duration: 00m 50s)
  • 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:16 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1033.eqiad.wmnet
  • 20:16 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2034.codfw.wmnet
  • 20:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Re-enable disabled Special pages on medium wikis (T48094) (duration: 00m 55s)
  • 20:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2034.codfw.wmnet
  • 20:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1033.eqiad.wmnet
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T306560)', diff saved to https://phabricator.wikimedia.org/P27361 and previous config saved to /var/cache/conftool/dbconfig/20220503-200634-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27360 and previous config saved to /var/cache/conftool/dbconfig/20220503-195129-ladsgroup.json
  • 19:49 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host logstash1032.eqiad.wmnet
  • 19:48 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2033.codfw.wmnet
  • 19:40 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2033.codfw.wmnet
  • 19:40 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1032.eqiad.wmnet
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27359 and previous config saved to /var/cache/conftool/dbconfig/20220503-193624-ladsgroup.json
  • 19:35 herron@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host logstash2031.codfw.wmnet
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T306560)', diff saved to https://phabricator.wikimedia.org/P27358 and previous config saved to /var/cache/conftool/dbconfig/20220503-192119-ladsgroup.json
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T306560)', diff saved to https://phabricator.wikimedia.org/P27357 and previous config saved to /var/cache/conftool/dbconfig/20220503-191909-ladsgroup.json
  • 19:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 19:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 19:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 19:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: T306560', diff saved to https://phabricator.wikimedia.org/P27356 and previous config saved to /var/cache/conftool/dbconfig/20220503-183901-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: T306560', diff saved to https://phabricator.wikimedia.org/P27355 and previous config saved to /var/cache/conftool/dbconfig/20220503-182357-ladsgroup.json
  • 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T306560)', diff saved to https://phabricator.wikimedia.org/P27354 and previous config saved to /var/cache/conftool/dbconfig/20220503-181457-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • afk: train 1.39.0-wmf.10 (T305216): amending prior logline: planning to move to _group0_ on return
  • 18:04 ebernhardson: start ttmserver-export.php from Translate against codfw search cluster for T306811
  • 18:04 brennen: train 1.39.0-wmf.10 (T305216): train is still blocked on T307019, although in practice that blocker doesn't prevent us from going ahead safely. i'm going unavoidably afk for a couple of hours; plan to move train to group1 on my return.
  • 18:03 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1031.eqiad.wmnet
  • 17:57 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2031.codfw.wmnet
  • 17:57 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1031.eqiad.wmnet
  • 17:57 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1030.eqiad.wmnet
  • 17:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2030.codfw.wmnet
  • 17:50 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2030.codfw.wmnet
  • 17:49 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1030.eqiad.wmnet
  • 17:32 mutante: removing geoip-database from all install hosts with [cumin2002:~] $ sudo cumin 'install*' 'apt-get remove geoip-database'
  • 17:30 mutante: install2003 - apt-get remove geoip-databases
  • 17:25 mutante: [webperf1002:~] $ sudo systemctl start arclamp_compress_logs (was failed with https://ms-fe.svc.eqiad.wmnet/... returning 503) but worked fine when manually starting it
  • 17:25 mutante: [webperf1002:~] $ sudo systemctl status arclamp_compress_logs
  • 17:20 mutante: install1003 - apt-get remove geoip-database libgeoip1 and running puppet. I don't see why these are installed here
  • 15:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1041.eqiad.wmnet with OS bullseye
  • 15:42 sukhe: enable puppet on A:dns-rec and A:wikidough
  • 15:10 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2029.codfw.wmnet
  • 15:09 sukhe: disable puppet on A:wikidough to deploy CR 779936
  • 15:08 sukhe: disable puppet on A:dns-rec to deploy CR 779936
  • 15:05 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2029.codfw.wmnet
  • 15:01 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1041.eqiad.wmnet with reason: host reimage
  • 14:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1041.eqiad.wmnet with reason: host reimage
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
  • 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2012.codfw.wmnet: Restarting for cert refresh - hnowlan@cumin1001
  • 14:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1029.eqiad.wmnet
  • 14:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1041.eqiad.wmnet with OS bullseye
  • 14:46 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be1041.eqiad.wmnet with OS bullseye
  • 14:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
  • 14:42 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1029.eqiad.wmnet
  • 14:40 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2012.codfw.wmnet: Restarting for cert refresh - hnowlan@cumin1001
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2026.codfw.wmnet with reason: reboot
  • 14:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2026.codfw.wmnet with reason: reboot
  • 14:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase101[6-8].eqiad.wmnet: Restarting for cert refresh - hnowlan@cumin1001
  • 14:27 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1041.eqiad.wmnet with OS bullseye
  • 14:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2043.codfw.wmnet with OS bullseye
  • 14:18 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1028.eqiad.wmnet
  • 14:11 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1028.eqiad.wmnet
  • 14:07 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase101[6-8].eqiad.wmnet: Restarting for cert refresh - hnowlan@cumin1001
  • 14:03 vgutierrez: upgrade haproxy to 2.4.16 on cp3050 - T307444
  • 14:02 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2028.codfw.wmnet
  • 13:56 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2028.codfw.wmnet
  • 13:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1027.eqiad.wmnet
  • 13:54 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2027.codfw.wmnet
  • 13:47 moritzm: stopped/maske coal/navtiming on webperf1001/webperf2001 T305460
  • 13:46 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2027.codfw.wmnet
  • 13:45 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1027.eqiad.wmnet
  • 13:45 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1002.eqiad.wmnet with OS bullseye
  • 13:40 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2043.codfw.wmnet with reason: host reimage
  • 13:36 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2043.codfw.wmnet with reason: host reimage
  • 13:28 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1002.eqiad.wmnet with reason: host reimage
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:25 Lucas_WMDE: UTC afternoon backport window done
  • 13:25 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1002.eqiad.wmnet with reason: host reimage
  • 13:22 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2043.codfw.wmnet with OS bullseye
  • 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Use "unexpectedUnconnectedPage" page prop everywhere (duration: 00m 51s)
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused maps event stream (T293366) (duration: 01m 04s)
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:09 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
  • 13:08 jynus@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1002.eqiad.wmnet with OS bullseye
  • 13:08 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS bullseye
  • 13:08 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
  • 13:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2026.codfw.wmnet
  • 13:03 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1026.eqiad.wmnet
  • 12:58 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2026.codfw.wmnet
  • 12:58 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1026.eqiad.wmnet
  • 12:57 vgutierrez: rolling downgrade of HAProxy to version 2.4.15 on upload - T307444
  • 10:58 vgutierrez: rolling downgrade of HAProxy to version 2.4.15 on text - T307444
  • 10:57 jbond: restrict ports allowed via squid
  • 10:46 vgutierrez: downgrade haproxy 2.4 package to version 2.4.15 on apt.wm.o (buster-wikimedia)
  • 09:47 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb2001-dev.codfw.wmnet
  • 09:40 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb2001-dev.codfw.wmnet
  • 09:38 gehel: resetting BMC on relforge1003 and relforge1004 - https://wikitech.wikimedia.org/wiki/Management_Interfaces#From_local_IPMI
  • 09:32 vgutierrez: rolling upgrade of HAProxy in eqiad
  • 09:14 marostegui: Disable puppet on clouddb1013 clouddb1016 clouddb1020 T305974
  • 09:14 marostegui: Disable puppet on clouddb1013 clouddb1016 clouddb1020T305974
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:51 hashar@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.10 refs T305216 (duration: 30m 44s)
  • 08:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2001.codfw.wmnet with OS buster
  • 08:44 vgutierrez: rolling upgrade of HAProxy in esams
  • 08:29 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1002.eqiad.wmnet
  • 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:24 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1002.eqiad.wmnet
  • 08:21 hashar@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.10 refs T305216
  • 08:14 hashar: Starting MediaWiki train deployment using `scap stage-train 1.39.0-wmf.10` # T305216
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1132 T301879', diff saved to https://phabricator.wikimedia.org/P27350 and previous config saved to /var/cache/conftool/dbconfig/20220503-080421-marostegui.json
  • 08:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2001.codfw.wmnet with reason: host reimage
  • 07:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2001.codfw.wmnet with reason: host reimage
  • 07:33 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores2001.codfw.wmnet with OS buster
  • 07:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:20 taavi: UTC morning deploys done
  • 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nikki Nikkhoui out of all services on: 1224 hosts
  • 07:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nikki Nikkhoui out of all services on: 1224 hosts
  • 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nikki Nikkhoui out of all services on: 513 hosts
  • 07:18 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nikki Nikkhoui out of all services on: 513 hosts
  • 07:15 awight@deploy1002: Synchronized wmf-config: Config: Enable the versioned mapdata API (T307110) (duration: 00m 48s)
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-05-02

  • 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye
  • 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS bullseye
  • 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
  • 22:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage
  • 22:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage
  • 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host krb2002.codfw.wmnet with OS bullseye
  • 22:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS bullseye
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:49 catrope@deploy1002: Finished scap: Backport: [TOC] Remove pointer-events:none on .sidebar-toc-link (T307271) and Video landing page: Show different title/body text on mobile (T303785) (duration: 11m 45s)
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:46 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 20:44 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 20:42 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 20:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2012.codfw.wmnet with OS bullseye
  • 20:40 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 20:38 catrope@deploy1002: Started scap: Backport: [TOC] Remove pointer-events:none on .sidebar-toc-link (T307271) and Video landing page: Show different title/body text on mobile (T303785)
  • 20:32 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 20:31 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 20:30 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Revert "Start writing to cuc_actor in guwwiki and shnwikivoyage"" (T233004) (duration: 00m 47s)
  • 20:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
  • 20:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
  • 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS bullseye
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2012.codfw.wmnet with OS bullseye
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: cirrus: Update MLR models to 20220421 deployment (T306123) (duration: 00m 48s)
  • 20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:13 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host logstash2025.codfw.wmnet
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1025.eqiad.wmnet
  • 20:09 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: zhwiki: Update zh-hans version tagline and wordmark files (T276694) (duration: 00m 47s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:08 catrope@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-tagline-zh-hans.svg: Config: zhwiki: Update zh-hans version tagline and wordmark files (T276694) (duration: 00m 47s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:07 catrope@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-zh-hans.svg: Config: zhwiki: Update zh-hans version tagline and wordmark files (T276694) (duration: 00m 48s)
  • 20:05 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9fa5d7e]: Fix app_session_metrics [airflow-dags/analytics@9fa5d7e] (duration: 00m 09s)
  • 20:04 aqu@deploy1002: Started deploy [airflow-dags/analytics@9fa5d7e]: Fix app_session_metrics [airflow-dags/analytics@9fa5d7e]
  • 20:04 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2025.codfw.wmnet
  • 20:04 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1025.eqiad.wmnet
  • 20:03 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2024.codfw.wmnet
  • 20:03 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1024.eqiad.wmnet
  • 19:59 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2024.codfw.wmnet
  • 19:59 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1024.eqiad.wmnet
  • 19:58 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1023.eqiad.wmnet
  • 19:58 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host logstash2023.codfw.wmnet
  • 19:56 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host krb2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2011.codfw.wmnet with OS bullseye
  • 19:49 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2023.codfw.wmnet
  • 19:49 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1023.eqiad.wmnet
  • 19:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS bullseye
  • 19:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
  • 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2003.codfw.wmnet
  • 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2010.codfw.wmnet with OS bullseye
  • 19:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
  • 19:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1012.eqiad.wmnet
  • 19:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:30 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2003.codfw.wmnet
  • 19:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS bullseye
  • 19:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2002.codfw.wmnet
  • 19:28 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1012.eqiad.wmnet
  • 19:26 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1011.eqiad.wmnet
  • 19:24 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2002.codfw.wmnet
  • 19:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
  • 19:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2001.codfw.wmnet
  • 19:22 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1011.eqiad.wmnet
  • 19:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash1010.eqiad.wmnet
  • 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
  • 19:18 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2001.codfw.wmnet
  • 19:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2011.codfw.wmnet with OS bullseye
  • 19:15 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS bullseye
  • 19:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2011.codfw.wmnet with OS bullseye
  • 19:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS bullseye
  • 19:14 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash1010.eqiad.wmnet
  • 19:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS bullseye
  • 19:12 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2011.codfw.wmnet with OS bullseye
  • 19:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2010.codfw.wmnet with OS bullseye
  • 18:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS bullseye
  • 18:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS bullseye
  • 18:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2009.codfw.wmnet with OS bullseye
  • 18:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2008.codfw.wmnet with OS bullseye
  • 18:24 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_purge_expired_blocks - starting new timer for T257473
  • 18:24 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_purge_expired_blocks
  • 18:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
  • 18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
  • 18:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
  • 18:12 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
  • 18:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2009.codfw.wmnet with OS bullseye
  • 18:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2008.codfw.wmnet with OS bullseye
  • 18:06 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2286.codfw.wmnet
  • 18:06 mutante: repooling mw2286 after hardware repair - T306823
  • 18:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2009.codfw.wmnet with OS bullseye
  • 17:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2008.codfw.wmnet with OS bullseye
  • 17:51 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f94bb01]: T306123: adjust uploaded models to always have a positive score (duration: 00m 45s)
  • 17:50 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f94bb01]: T306123: adjust uploaded models to always have a positive score
  • 17:46 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1018.eqiad.wmnet
  • 17:45 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:42 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 17:38 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1018.eqiad.wmnet
  • 17:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2009.codfw.wmnet with OS bullseye
  • 17:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2008.codfw.wmnet with OS bullseye
  • 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2006.codfw.wmnet with OS bullseye
  • 17:05 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1001.eqiad.wmnet
  • 17:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2007.codfw.wmnet with OS bullseye
  • 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
  • 16:56 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1001.eqiad.wmnet
  • 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
  • 16:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
  • 16:48 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 09s)
  • 16:48 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 16:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2006.codfw.wmnet with OS bullseye
  • 16:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
  • 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2007.codfw.wmnet with OS bullseye
  • 16:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2006.codfw.wmnet with OS bullseye
  • 16:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2007.codfw.wmnet with OS bullseye
  • 16:06 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw2286.codfw.wmnet
  • 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2006.codfw.wmnet with OS bullseye
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2007.codfw.wmnet with OS bullseye
  • 16:02 jelto: mw2286: scap pull and recheck icinga checks. Server came up after hardware failure
  • 16:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2005.codfw.wmnet with OS bullseye
  • 15:56 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 27s)
  • 15:56 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
  • 15:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
  • 15:47 andrewbogott: upgrading wikitech-static to REL1_38. this in includes a hotfix of https://gerrit.wikimedia.org/r/c/operations/wikitech-static/+/788370
  • 15:47 vgutierrez: rolling upgrade of HAProxy in eqsin
  • 15:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS bullseye
  • 15:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2005.codfw.wmnet with OS bullseye
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2002.codfw.wmnet with OS buster
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host aqs2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS bullseye
  • 14:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs2005.codfw.wmnet with OS bullseye
  • 14:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS bullseye
  • 14:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2002.codfw.wmnet with reason: host reimage
  • 14:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2002.codfw.wmnet with reason: host reimage
  • 14:13 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores2002.codfw.wmnet with OS buster
  • 14:11 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudbackup2001.codfw.wmnet
  • 14:11 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
  • 14:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2004.codfw.wmnet with OS bullseye
  • 14:05 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudbackup2001.codfw.wmnet
  • 13:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:59 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation in testwiki for af, as, gu, kn, mk and sr (T304828, T304858) (duration: 00m 49s)
  • 13:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
  • 13:54 rook@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 118 hosts with reason: upgrading openstack
  • 13:53 rook@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 118 hosts with reason: upgrading openstack
  • 13:53 rook@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: upgrading openstack
  • 13:52 rook@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: upgrading openstack
  • 13:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2004.codfw.wmnet with OS bullseye
  • 13:49 vgutierrez: rolling upgrade of HAProxy in codfw
  • 13:48 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 18s)
  • 13:48 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 13:41 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
  • 13:13 godog: start removal of 'tegola-swift-container' and its objects - T307184
  • 12:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 9 hosts with reason: Deploying schema change to s2@codfw T303603
  • 12:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 9 hosts with reason: Deploying schema change to s2@codfw T303603
  • 12:57 vgutierrez: rolling upgrade of HAProxy in drmrs
  • 12:55 kormat: dbmaint Deploying schema change to s2@codfw (T303603)
  • 12:48 volans: swapped /srv/deployment directory on deploy1002 with the one from the latest backup - T307349
  • 12:45 kormat: dbmaint Deploying schema change to s2 (T303603)
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298563)', diff saved to https://phabricator.wikimedia.org/P27349 and previous config saved to /var/cache/conftool/dbconfig/20220502-121018-ladsgroup.json
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27347 and previous config saved to /var/cache/conftool/dbconfig/20220502-115513-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P27346 and previous config saved to /var/cache/conftool/dbconfig/20220502-114007-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298563)', diff saved to https://phabricator.wikimedia.org/P27345 and previous config saved to /var/cache/conftool/dbconfig/20220502-112502-ladsgroup.json
  • 11:11 vgutierrez: rolling upgrade of HAProxy in ulsfo
  • 11:07 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 45s)
  • 11:06 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 11:04 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 05s)
  • 11:04 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 11:01 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 19s)
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T298563)', diff saved to https://phabricator.wikimedia.org/P27344 and previous config saved to /var/cache/conftool/dbconfig/20220502-110041-ladsgroup.json
  • 11:00 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 11:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298563)', diff saved to https://phabricator.wikimedia.org/P27343 and previous config saved to /var/cache/conftool/dbconfig/20220502-110033-ladsgroup.json
  • 10:59 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 10:58 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 40s)
  • 10:57 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 10:57 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 06s)
  • 10:57 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 10:49 klausman@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 05s)
  • 10:48 klausman@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 10:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P27342 and previous config saved to /var/cache/conftool/dbconfig/20220502-104528-ladsgroup.json
  • 10:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:43 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
  • 10:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 10:42 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS bullseye
  • 10:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P27341 and previous config saved to /var/cache/conftool/dbconfig/20220502-103023-ladsgroup.json
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T306560)', diff saved to https://phabricator.wikimedia.org/P27340 and previous config saved to /var/cache/conftool/dbconfig/20220502-102402-ladsgroup.json
  • 10:19 klausman@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ores2002.codfw.wmnet with OS buster
  • 10:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2002.codfw.wmnet with reason: host reimage
  • 10:15 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2002.codfw.wmnet with reason: host reimage
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298563)', diff saved to https://phabricator.wikimedia.org/P27338 and previous config saved to /var/cache/conftool/dbconfig/20220502-101518-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P27337 and previous config saved to /var/cache/conftool/dbconfig/20220502-100857-ladsgroup.json
  • 10:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P27334 and previous config saved to /var/cache/conftool/dbconfig/20220502-095352-ladsgroup.json
  • 09:50 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ores2002.codfw.wmnet with OS buster
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T298563)', diff saved to https://phabricator.wikimedia.org/P27333 and previous config saved to /var/cache/conftool/dbconfig/20220502-094938-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298563)', diff saved to https://phabricator.wikimedia.org/P27332 and previous config saved to /var/cache/conftool/dbconfig/20220502-094930-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T306560)', diff saved to https://phabricator.wikimedia.org/P27331 and previous config saved to /var/cache/conftool/dbconfig/20220502-093847-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T306560)', diff saved to https://phabricator.wikimedia.org/P27330 and previous config saved to /var/cache/conftool/dbconfig/20220502-093628-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P27329 and previous config saved to /var/cache/conftool/dbconfig/20220502-093547-ladsgroup.json
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P27328 and previous config saved to /var/cache/conftool/dbconfig/20220502-093425-ladsgroup.json
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P27327 and previous config saved to /var/cache/conftool/dbconfig/20220502-092042-ladsgroup.json
  • 09:19 moritzm: installing ghostscript security updates on Stretch (newer distros not affected)
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P27326 and previous config saved to /var/cache/conftool/dbconfig/20220502-091920-ladsgroup.json
  • 09:06 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
  • 09:05 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS bullseye
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P27325 and previous config saved to /var/cache/conftool/dbconfig/20220502-090537-ladsgroup.json
  • 09:05 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298563)', diff saved to https://phabricator.wikimedia.org/P27324 and previous config saved to /var/cache/conftool/dbconfig/20220502-090415-ladsgroup.json
  • 09:04 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS bullseye
  • 09:03 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
  • 09:01 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS bullseye
  • 09:01 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
  • 09:00 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS bullseye
  • 09:00 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS bullseye
  • 09:00 jynus@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1002.eqiad.wmnet with OS buster
  • 08:57 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS buster
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P27323 and previous config saved to /var/cache/conftool/dbconfig/20220502-085032-ladsgroup.json
  • 08:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P27322 and previous config saved to /var/cache/conftool/dbconfig/20220502-084812-ladsgroup.json
  • 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T306560)', diff saved to https://phabricator.wikimedia.org/P27321 and previous config saved to /var/cache/conftool/dbconfig/20220502-084747-ladsgroup.json
  • 08:47 vgutierrez: test HAProxy 2.4.16 on cp4034 and cp4036
  • 08:46 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia
  • 08:45 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1002.eqiad.wmnet with OS buster
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298565)', diff saved to https://phabricator.wikimedia.org/P27320 and previous config saved to /var/cache/conftool/dbconfig/20220502-084200-ladsgroup.json
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T298563)', diff saved to https://phabricator.wikimedia.org/P27319 and previous config saved to /var/cache/conftool/dbconfig/20220502-083456-ladsgroup.json
  • 08:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298563)', diff saved to https://phabricator.wikimedia.org/P27318 and previous config saved to /var/cache/conftool/dbconfig/20220502-083442-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P27317 and previous config saved to /var/cache/conftool/dbconfig/20220502-083242-ladsgroup.json
  • 08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2001.codfw.wmnet with OS buster
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P27316 and previous config saved to /var/cache/conftool/dbconfig/20220502-082654-ladsgroup.json
  • 08:24 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 02m 06s)
  • 08:22 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P27315 and previous config saved to /var/cache/conftool/dbconfig/20220502-081937-ladsgroup.json
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P27314 and previous config saved to /var/cache/conftool/dbconfig/20220502-081737-ladsgroup.json
  • 08:16 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1002.eqiad.wmnet with OS buster
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P27313 and previous config saved to /var/cache/conftool/dbconfig/20220502-081149-ladsgroup.json
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298295)', diff saved to https://phabricator.wikimedia.org/P27312 and previous config saved to /var/cache/conftool/dbconfig/20220502-080513-ladsgroup.json
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P27311 and previous config saved to /var/cache/conftool/dbconfig/20220502-080432-ladsgroup.json
  • 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T306560)', diff saved to https://phabricator.wikimedia.org/P27310 and previous config saved to /var/cache/conftool/dbconfig/20220502-080232-ladsgroup.json
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T306560)', diff saved to https://phabricator.wikimedia.org/P27309 and previous config saved to /var/cache/conftool/dbconfig/20220502-080012-ladsgroup.json
  • 08:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T306560)', diff saved to https://phabricator.wikimedia.org/P27308 and previous config saved to /var/cache/conftool/dbconfig/20220502-075957-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298565)', diff saved to https://phabricator.wikimedia.org/P27307 and previous config saved to /var/cache/conftool/dbconfig/20220502-075644-ladsgroup.json
  • 07:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2001.codfw.wmnet with reason: host reimage
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27306 and previous config saved to /var/cache/conftool/dbconfig/20220502-075008-ladsgroup.json
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298563)', diff saved to https://phabricator.wikimedia.org/P27305 and previous config saved to /var/cache/conftool/dbconfig/20220502-074927-ladsgroup.json
  • 07:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2001.codfw.wmnet with reason: host reimage
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P27304 and previous config saved to /var/cache/conftool/dbconfig/20220502-074452-ladsgroup.json
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T298565)', diff saved to https://phabricator.wikimedia.org/P27303 and previous config saved to /var/cache/conftool/dbconfig/20220502-074006-ladsgroup.json
  • 07:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P27302 and previous config saved to /var/cache/conftool/dbconfig/20220502-073958-ladsgroup.json
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27301 and previous config saved to /var/cache/conftool/dbconfig/20220502-073503-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P27300 and previous config saved to /var/cache/conftool/dbconfig/20220502-072947-ladsgroup.json
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P27299 and previous config saved to /var/cache/conftool/dbconfig/20220502-072452-ladsgroup.json
  • 07:23 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores2001.codfw.wmnet with OS buster
  • 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T298563)', diff saved to https://phabricator.wikimedia.org/P27298 and previous config saved to /var/cache/conftool/dbconfig/20220502-072303-ladsgroup.json
  • 07:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 07:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298563)', diff saved to https://phabricator.wikimedia.org/P27297 and previous config saved to /var/cache/conftool/dbconfig/20220502-072255-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298295)', diff saved to https://phabricator.wikimedia.org/P27296 and previous config saved to /var/cache/conftool/dbconfig/20220502-071958-ladsgroup.json
  • 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T298295)', diff saved to https://phabricator.wikimedia.org/P27295 and previous config saved to /var/cache/conftool/dbconfig/20220502-071741-ladsgroup.json
  • 07:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 07:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298295)', diff saved to https://phabricator.wikimedia.org/P27294 and previous config saved to /var/cache/conftool/dbconfig/20220502-071728-ladsgroup.json
  • 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T306560)', diff saved to https://phabricator.wikimedia.org/P27293 and previous config saved to /var/cache/conftool/dbconfig/20220502-071442-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T306560)', diff saved to https://phabricator.wikimedia.org/P27292 and previous config saved to /var/cache/conftool/dbconfig/20220502-071222-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T306560)', diff saved to https://phabricator.wikimedia.org/P27291 and previous config saved to /var/cache/conftool/dbconfig/20220502-071214-ladsgroup.json
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P27290 and previous config saved to /var/cache/conftool/dbconfig/20220502-070947-ladsgroup.json
  • 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P27289 and previous config saved to /var/cache/conftool/dbconfig/20220502-070750-ladsgroup.json
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27288 and previous config saved to /var/cache/conftool/dbconfig/20220502-070222-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P27287 and previous config saved to /var/cache/conftool/dbconfig/20220502-065709-ladsgroup.json
  • 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P27286 and previous config saved to /var/cache/conftool/dbconfig/20220502-065442-ladsgroup.json
  • 06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P27285 and previous config saved to /var/cache/conftool/dbconfig/20220502-065245-ladsgroup.json
  • 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27284 and previous config saved to /var/cache/conftool/dbconfig/20220502-064717-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P27283 and previous config saved to /var/cache/conftool/dbconfig/20220502-064204-ladsgroup.json
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P27282 and previous config saved to /var/cache/conftool/dbconfig/20220502-063837-ladsgroup.json
  • 06:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298563)', diff saved to https://phabricator.wikimedia.org/P27281 and previous config saved to /var/cache/conftool/dbconfig/20220502-063740-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298295)', diff saved to https://phabricator.wikimedia.org/P27280 and previous config saved to /var/cache/conftool/dbconfig/20220502-063212-ladsgroup.json
  • 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T298295)', diff saved to https://phabricator.wikimedia.org/P27279 and previous config saved to /var/cache/conftool/dbconfig/20220502-063055-ladsgroup.json
  • 06:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298295)', diff saved to https://phabricator.wikimedia.org/P27278 and previous config saved to /var/cache/conftool/dbconfig/20220502-063047-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T306560)', diff saved to https://phabricator.wikimedia.org/P27277 and previous config saved to /var/cache/conftool/dbconfig/20220502-062659-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T306560)', diff saved to https://phabricator.wikimedia.org/P27276 and previous config saved to /var/cache/conftool/dbconfig/20220502-062139-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T306560)', diff saved to https://phabricator.wikimedia.org/P27275 and previous config saved to /var/cache/conftool/dbconfig/20220502-062131-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P27274 and previous config saved to /var/cache/conftool/dbconfig/20220502-062131-ladsgroup.json
  • 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27273 and previous config saved to /var/cache/conftool/dbconfig/20220502-061540-ladsgroup.json
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P27272 and previous config saved to /var/cache/conftool/dbconfig/20220502-060626-ladsgroup.json
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P27271 and previous config saved to /var/cache/conftool/dbconfig/20220502-060626-ladsgroup.json
  • 06:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27270 and previous config saved to /var/cache/conftool/dbconfig/20220502-060035-ladsgroup.json
  • 05:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 05:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 05:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 05:53 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: TimedMediaHandler: Make videojs the only player everywhere (T248418) (duration: 00m 47s)
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P27269 and previous config saved to /var/cache/conftool/dbconfig/20220502-055121-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P27268 and previous config saved to /var/cache/conftool/dbconfig/20220502-055121-ladsgroup.json
  • 05:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298295)', diff saved to https://phabricator.wikimedia.org/P27267 and previous config saved to /var/cache/conftool/dbconfig/20220502-054530-ladsgroup.json
  • 05:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T298295)', diff saved to https://phabricator.wikimedia.org/P27266 and previous config saved to /var/cache/conftool/dbconfig/20220502-054313-ladsgroup.json
  • 05:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 05:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 05:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298295)', diff saved to https://phabricator.wikimedia.org/P27265 and previous config saved to /var/cache/conftool/dbconfig/20220502-054305-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T298563)', diff saved to https://phabricator.wikimedia.org/P27264 and previous config saved to /var/cache/conftool/dbconfig/20220502-054040-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P27263 and previous config saved to /var/cache/conftool/dbconfig/20220502-053615-ladsgroup.json
  • 05:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T306560)', diff saved to https://phabricator.wikimedia.org/P27262 and previous config saved to /var/cache/conftool/dbconfig/20220502-053357-ladsgroup.json
  • 05:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T306560)', diff saved to https://phabricator.wikimedia.org/P27261 and previous config saved to /var/cache/conftool/dbconfig/20220502-053349-ladsgroup.json
  • 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27260 and previous config saved to /var/cache/conftool/dbconfig/20220502-052800-ladsgroup.json
  • 05:20 Amir1: killed bnwiki's refresh links recommendation (T299021)
  • 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P27259 and previous config saved to /var/cache/conftool/dbconfig/20220502-051844-ladsgroup.json
  • 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P27258 and previous config saved to /var/cache/conftool/dbconfig/20220502-051402-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27257 and previous config saved to /var/cache/conftool/dbconfig/20220502-051255-ladsgroup.json
  • 05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P27256 and previous config saved to /var/cache/conftool/dbconfig/20220502-050339-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298295)', diff saved to https://phabricator.wikimedia.org/P27255 and previous config saved to /var/cache/conftool/dbconfig/20220502-045750-ladsgroup.json
  • 04:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P27254 and previous config saved to /var/cache/conftool/dbconfig/20220502-045656-ladsgroup.json
  • 04:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 04:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T298295)', diff saved to https://phabricator.wikimedia.org/P27253 and previous config saved to /var/cache/conftool/dbconfig/20220502-045532-ladsgroup.json
  • 04:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 04:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 04:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 04:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 04:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 04:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T306560)', diff saved to https://phabricator.wikimedia.org/P27252 and previous config saved to /var/cache/conftool/dbconfig/20220502-044834-ladsgroup.json
  • 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T306560)', diff saved to https://phabricator.wikimedia.org/P27251 and previous config saved to /var/cache/conftool/dbconfig/20220502-044614-ladsgroup.json
  • 04:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 04:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P27250 and previous config saved to /var/cache/conftool/dbconfig/20220502-044606-ladsgroup.json
  • 04:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P27249 and previous config saved to /var/cache/conftool/dbconfig/20220502-044151-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P27248 and previous config saved to /var/cache/conftool/dbconfig/20220502-043101-ladsgroup.json
  • 04:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P27247 and previous config saved to /var/cache/conftool/dbconfig/20220502-042646-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P27246 and previous config saved to /var/cache/conftool/dbconfig/20220502-041556-ladsgroup.json
  • 04:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P27245 and previous config saved to /var/cache/conftool/dbconfig/20220502-041141-ladsgroup.json
  • 04:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
  • 04:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
  • 04:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 04:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T298295)', diff saved to https://phabricator.wikimedia.org/P27244 and previous config saved to /var/cache/conftool/dbconfig/20220502-040908-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T298295)', diff saved to https://phabricator.wikimedia.org/P27243 and previous config saved to /var/cache/conftool/dbconfig/20220502-040754-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 04:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298295)', diff saved to https://phabricator.wikimedia.org/P27242 and previous config saved to /var/cache/conftool/dbconfig/20220502-040745-ladsgroup.json
  • 04:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P27241 and previous config saved to /var/cache/conftool/dbconfig/20220502-040051-ladsgroup.json
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T306560)', diff saved to https://phabricator.wikimedia.org/P27240 and previous config saved to /var/cache/conftool/dbconfig/20220502-035830-ladsgroup.json
  • 03:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 03:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 03:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 03:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 03:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T306560)', diff saved to https://phabricator.wikimedia.org/P27239 and previous config saved to /var/cache/conftool/dbconfig/20220502-035733-ladsgroup.json
  • 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298565)', diff saved to https://phabricator.wikimedia.org/P27238 and previous config saved to /var/cache/conftool/dbconfig/20220502-035522-ladsgroup.json
  • 03:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 03:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P27237 and previous config saved to /var/cache/conftool/dbconfig/20220502-035514-ladsgroup.json
  • 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27236 and previous config saved to /var/cache/conftool/dbconfig/20220502-035240-ladsgroup.json
  • 03:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T306560)', diff saved to https://phabricator.wikimedia.org/P27235 and previous config saved to /var/cache/conftool/dbconfig/20220502-034657-ladsgroup.json
  • 03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P27234 and previous config saved to /var/cache/conftool/dbconfig/20220502-034228-ladsgroup.json
  • 03:41 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set testwiki to READ NEW for templatelinks migration (T306673) (duration: 00m 49s)
  • 03:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P27233 and previous config saved to /var/cache/conftool/dbconfig/20220502-034009-ladsgroup.json
  • 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27232 and previous config saved to /var/cache/conftool/dbconfig/20220502-033735-ladsgroup.json
  • 03:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P27231 and previous config saved to /var/cache/conftool/dbconfig/20220502-033152-ladsgroup.json
  • 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P27230 and previous config saved to /var/cache/conftool/dbconfig/20220502-032723-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P27229 and previous config saved to /var/cache/conftool/dbconfig/20220502-032504-ladsgroup.json
  • 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298295)', diff saved to https://phabricator.wikimedia.org/P27228 and previous config saved to /var/cache/conftool/dbconfig/20220502-032229-ladsgroup.json
  • 03:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T298295)', diff saved to https://phabricator.wikimedia.org/P27227 and previous config saved to /var/cache/conftool/dbconfig/20220502-032011-ladsgroup.json
  • 03:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 03:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 03:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298563)', diff saved to https://phabricator.wikimedia.org/P27226 and previous config saved to /var/cache/conftool/dbconfig/20220502-031944-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P27225 and previous config saved to /var/cache/conftool/dbconfig/20220502-031646-ladsgroup.json
  • 03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T306560)', diff saved to https://phabricator.wikimedia.org/P27224 and previous config saved to /var/cache/conftool/dbconfig/20220502-031218-ladsgroup.json
  • 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T306560)', diff saved to https://phabricator.wikimedia.org/P27223 and previous config saved to /var/cache/conftool/dbconfig/20220502-030958-ladsgroup.json
  • 03:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 03:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 03:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 03:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 03:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P27222 and previous config saved to /var/cache/conftool/dbconfig/20220502-030439-ladsgroup.json
  • 03:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T306560)', diff saved to https://phabricator.wikimedia.org/P27221 and previous config saved to /var/cache/conftool/dbconfig/20220502-030141-ladsgroup.json
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T306560)', diff saved to https://phabricator.wikimedia.org/P27220 and previous config saved to /var/cache/conftool/dbconfig/20220502-025930-ladsgroup.json
  • 02:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 02:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P27219 and previous config saved to /var/cache/conftool/dbconfig/20220502-024934-ladsgroup.json
  • 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T298565)', diff saved to https://phabricator.wikimedia.org/P27218 and previous config saved to /var/cache/conftool/dbconfig/20220502-023556-ladsgroup.json
  • 02:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27217 and previous config saved to /var/cache/conftool/dbconfig/20220502-023543-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298563)', diff saved to https://phabricator.wikimedia.org/P27216 and previous config saved to /var/cache/conftool/dbconfig/20220502-023429-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27215 and previous config saved to /var/cache/conftool/dbconfig/20220502-022038-ladsgroup.json
  • 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27214 and previous config saved to /var/cache/conftool/dbconfig/20220502-020533-ladsgroup.json
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27213 and previous config saved to /var/cache/conftool/dbconfig/20220502-015028-ladsgroup.json
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T298563)', diff saved to https://phabricator.wikimedia.org/P27212 and previous config saved to /var/cache/conftool/dbconfig/20220502-013641-ladsgroup.json
  • 01:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 01:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298563)', diff saved to https://phabricator.wikimedia.org/P27211 and previous config saved to /var/cache/conftool/dbconfig/20220502-013633-ladsgroup.json
  • 01:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27210 and previous config saved to /var/cache/conftool/dbconfig/20220502-013316-ladsgroup.json
  • 01:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 01:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T306560)', diff saved to https://phabricator.wikimedia.org/P27209 and previous config saved to /var/cache/conftool/dbconfig/20220502-012950-ladsgroup.json
  • 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P27208 and previous config saved to /var/cache/conftool/dbconfig/20220502-012128-ladsgroup.json
  • 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27207 and previous config saved to /var/cache/conftool/dbconfig/20220502-011607-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27206 and previous config saved to /var/cache/conftool/dbconfig/20220502-011445-ladsgroup.json
  • 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P27205 and previous config saved to /var/cache/conftool/dbconfig/20220502-010623-ladsgroup.json
  • 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27204 and previous config saved to /var/cache/conftool/dbconfig/20220502-010102-ladsgroup.json
  • 00:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P27203 and previous config saved to /var/cache/conftool/dbconfig/20220502-005940-ladsgroup.json
  • 00:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298563)', diff saved to https://phabricator.wikimedia.org/P27202 and previous config saved to /var/cache/conftool/dbconfig/20220502-005118-ladsgroup.json
  • 00:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27201 and previous config saved to /var/cache/conftool/dbconfig/20220502-004557-ladsgroup.json
  • 00:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T306560)', diff saved to https://phabricator.wikimedia.org/P27200 and previous config saved to /var/cache/conftool/dbconfig/20220502-004435-ladsgroup.json
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T306560)', diff saved to https://phabricator.wikimedia.org/P27199 and previous config saved to /var/cache/conftool/dbconfig/20220502-004222-ladsgroup.json
  • 00:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 00:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 00:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27198 and previous config saved to /var/cache/conftool/dbconfig/20220502-003052-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27197 and previous config saved to /var/cache/conftool/dbconfig/20220502-001449-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T298563)', diff saved to https://phabricator.wikimedia.org/P27196 and previous config saved to /var/cache/conftool/dbconfig/20220502-000151-ladsgroup.json
  • 00:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 00:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance

2022-05-01

  • 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27195 and previous config saved to /var/cache/conftool/dbconfig/20220501-235742-ladsgroup.json
  • 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T306560)', diff saved to https://phabricator.wikimedia.org/P27194 and previous config saved to /var/cache/conftool/dbconfig/20220501-235700-ladsgroup.json
  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T306560)', diff saved to https://phabricator.wikimedia.org/P27193 and previous config saved to /var/cache/conftool/dbconfig/20220501-235549-ladsgroup.json
  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 23:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 23:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 23:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T306560)', diff saved to https://phabricator.wikimedia.org/P27192 and previous config saved to /var/cache/conftool/dbconfig/20220501-235443-ladsgroup.json
  • 23:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27191 and previous config saved to /var/cache/conftool/dbconfig/20220501-234237-ladsgroup.json
  • 23:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27190 and previous config saved to /var/cache/conftool/dbconfig/20220501-233938-ladsgroup.json
  • 23:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27189 and previous config saved to /var/cache/conftool/dbconfig/20220501-232732-ladsgroup.json
  • 23:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P27188 and previous config saved to /var/cache/conftool/dbconfig/20220501-232433-ladsgroup.json
  • 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27187 and previous config saved to /var/cache/conftool/dbconfig/20220501-231227-ladsgroup.json
  • 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T306560)', diff saved to https://phabricator.wikimedia.org/P27186 and previous config saved to /var/cache/conftool/dbconfig/20220501-230928-ladsgroup.json
  • 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T306560)', diff saved to https://phabricator.wikimedia.org/P27185 and previous config saved to /var/cache/conftool/dbconfig/20220501-230715-ladsgroup.json
  • 23:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 23:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T306560)', diff saved to https://phabricator.wikimedia.org/P27184 and previous config saved to /var/cache/conftool/dbconfig/20220501-230707-ladsgroup.json
  • 22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27183 and previous config saved to /var/cache/conftool/dbconfig/20220501-225626-ladsgroup.json
  • 22:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 22:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27182 and previous config saved to /var/cache/conftool/dbconfig/20220501-225202-ladsgroup.json
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27181 and previous config saved to /var/cache/conftool/dbconfig/20220501-223920-ladsgroup.json
  • 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P27180 and previous config saved to /var/cache/conftool/dbconfig/20220501-223657-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27179 and previous config saved to /var/cache/conftool/dbconfig/20220501-222415-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T306560)', diff saved to https://phabricator.wikimedia.org/P27178 and previous config saved to /var/cache/conftool/dbconfig/20220501-222152-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T306560)', diff saved to https://phabricator.wikimedia.org/P27177 and previous config saved to /var/cache/conftool/dbconfig/20220501-221938-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T306560)', diff saved to https://phabricator.wikimedia.org/P27176 and previous config saved to /var/cache/conftool/dbconfig/20220501-221930-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P27175 and previous config saved to /var/cache/conftool/dbconfig/20220501-220910-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27174 and previous config saved to /var/cache/conftool/dbconfig/20220501-220425-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27173 and previous config saved to /var/cache/conftool/dbconfig/20220501-215405-ladsgroup.json
  • 21:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P27172 and previous config saved to /var/cache/conftool/dbconfig/20220501-214920-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T298565)', diff saved to https://phabricator.wikimedia.org/P27171 and previous config saved to /var/cache/conftool/dbconfig/20220501-213750-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 21:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 21:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 21:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 21:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T306560)', diff saved to https://phabricator.wikimedia.org/P27170 and previous config saved to /var/cache/conftool/dbconfig/20220501-213415-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T306560)', diff saved to https://phabricator.wikimedia.org/P27169 and previous config saved to /var/cache/conftool/dbconfig/20220501-213203-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T306560)', diff saved to https://phabricator.wikimedia.org/P27168 and previous config saved to /var/cache/conftool/dbconfig/20220501-213155-ladsgroup.json
  • 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27167 and previous config saved to /var/cache/conftool/dbconfig/20220501-211650-ladsgroup.json
  • 21:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 21:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P27166 and previous config saved to /var/cache/conftool/dbconfig/20220501-210145-ladsgroup.json
  • 20:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 20:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 20:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 20:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T306560)', diff saved to https://phabricator.wikimedia.org/P27165 and previous config saved to /var/cache/conftool/dbconfig/20220501-204640-ladsgroup.json
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T306560)', diff saved to https://phabricator.wikimedia.org/P27164 and previous config saved to /var/cache/conftool/dbconfig/20220501-204427-ladsgroup.json
  • 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 13:51 cwhite: reload nginx on conf1004/1005 to pick up cert changes
  • 06:37 cwhite: restart etcdmirror-conftool.eqiad.wmnet on conf2005
  • 05:44 mutante: puppetmaster1001 - sudo puppet cert clean etcd.eqiad.wmnet (expired)
  • 02:32 Krinkle: repool mw1340

Archives

See Server Admin Log/Archives.