You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4034.ulsfo.wmnet)
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24142 and previous config saved to /var/cache/conftool/dbconfig/20220406-013420-ladsgroup.json)
Line 1: Line 1:
== 2022-04-06 ==
* 01:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24142 and previous config saved to /var/cache/conftool/dbconfig/20220406-013420-ladsgroup.json
* 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24141 and previous config saved to /var/cache/conftool/dbconfig/20220406-011915-ladsgroup.json
* 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24140 and previous config saved to /var/cache/conftool/dbconfig/20220406-010410-ladsgroup.json
== 2022-04-05 ==
== 2022-04-05 ==
* 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24139 and previous config saved to /var/cache/conftool/dbconfig/20220405-233042-ladsgroup.json
* 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 22:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 22:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24138 and previous config saved to /var/cache/conftool/dbconfig/20220405-224352-ladsgroup.json
* 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24137 and previous config saved to /var/cache/conftool/dbconfig/20220405-222847-ladsgroup.json
* 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24136 and previous config saved to /var/cache/conftool/dbconfig/20220405-221342-ladsgroup.json
* 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24135 and previous config saved to /var/cache/conftool/dbconfig/20220405-215837-ladsgroup.json
* 21:21 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fd8b410] (duration: 06m 48s)
* 21:14 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fd8b410]
* 21:14 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410] (thin): Regular analytics weekly train THIN [analytics/refinery@fd8b410] (duration: 00m 10s)
* 21:14 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410] (thin): Regular analytics weekly train THIN [analytics/refinery@fd8b410]
* 21:13 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410]: Regular analytics weekly train [analytics/refinery@fd8b410] (duration: 22m 50s)
* 21:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6014.drmrs.wmnet
* 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24133 and previous config saved to /var/cache/conftool/dbconfig/20220405-205822-ladsgroup.json
* 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 20:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 20:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 20:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6014.drmrs.wmnet
* 20:50 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410]: Regular analytics weekly train [analytics/refinery@fd8b410]
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 urbanecm: UTC late B&C window done
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:47 mutante: puppetmaster1001 - running test downloads of geoip databases to a temp dir
* 20:47 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|8ea86349017e71dcd38bde0663cfb13e86fe127c}}: Change upload dialog automatic upload comments ([[phab:T305303|T305303]]) (duration: 00m 54s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:41 razzi: deploying refinery for https://gerrit.wikimedia.org/r/c/analytics/refinery/+/776269/
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6013.drmrs.wmnet
* 20:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|10c16c5ed46014ec6f5e771f84320441974bef6c}}: [config]: Undeploy GDI survey from EN,FR and ES wikis in PROD ([[phab:T303962|T303962]]) (duration: 00m 55s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6013.drmrs.wmnet
* 20:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6012.drmrs.wmnet
* 20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24132 and previous config saved to /var/cache/conftool/dbconfig/20220405-201315-ladsgroup.json
* 20:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6012.drmrs.wmnet
* 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24131 and previous config saved to /var/cache/conftool/dbconfig/20220405-195810-ladsgroup.json
* 19:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6011.drmrs.wmnet
* 19:49 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw(1307{{!}}1308{{!}}1309{{!}}1310{{!}}1311{{!}}1318{{!}}1334{{!}}1335{{!}}1336{{!}}1337).*
* 19:46 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6011.drmrs.wmnet
* 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24130 and previous config saved to /var/cache/conftool/dbconfig/20220405-194305-ladsgroup.json
* 19:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6010.drmrs.wmnet
* 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6010.drmrs.wmnet
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24129 and previous config saved to /var/cache/conftool/dbconfig/20220405-192800-ladsgroup.json
* 19:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6009.drmrs.wmnet
* 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6009.drmrs.wmnet
* 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6006.drmrs.wmnet
* 18:47 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
* 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6006.drmrs.wmnet
* 18:42 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
* 18:41 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
* 18:37 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
* 18:34 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.1-1+deb11u1_amd64.changes
* 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6005.drmrs.wmnet
* 18:28 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.1-1_amd64.changes  # [[phab:T299705|T299705]]
* 18:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2015.codfw.wmnet
* 18:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2016.codfw.wmnet
* 18:25 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2017.codfw.wmnet
* 18:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2018.codfw.wmnet
* 18:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6005.drmrs.wmnet
* 18:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2019.codfw.wmnet
* 18:23 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2015.codfw.wmnet
* 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2020.codfw.wmnet
* 18:22 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2016.codfw.wmnet
* 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24128 and previous config saved to /var/cache/conftool/dbconfig/20220405-181712-ladsgroup.json
* 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24127 and previous config saved to /var/cache/conftool/dbconfig/20220405-181658-ladsgroup.json
* 18:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6004.drmrs.wmnet
* 18:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6004.drmrs.wmnet
* 18:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
* 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24126 and previous config saved to /var/cache/conftool/dbconfig/20220405-180153-ladsgroup.json
* 18:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 17:59 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parse2020.codfw.wmnet
* 17:59 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
* 17:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2020.codfw.wmnet
* 17:58 mutante: rebooting hosts in the parse201* range, starting with parse2019, counting down
* 17:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6003.drmrs.wmnet
* 17:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
* 17:56 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 17:54 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host parse2020.codfw.wmnet
* 17:53 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
* 17:52 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse201[7-9].codfw.wmnet
* 17:51 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse201[7-9].wmnet
* 17:51 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2020.wmnet
* 17:49 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6003.drmrs.wmnet
* 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4035.ulsfo.wmnet
* 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6002.drmrs.wmnet
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24125 and previous config saved to /var/cache/conftool/dbconfig/20220405-174648-ladsgroup.json
* 17:40 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6002.drmrs.wmnet
* 17:40 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4035.ulsfo.wmnet
* 17:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4033.ulsfo.wmnet
* 17:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1040.eqiad.wmnet
* 17:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1041.eqiad.wmnet
* 17:32 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1040.eqiad.wmnet
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24124 and previous config saved to /var/cache/conftool/dbconfig/20220405-173143-ladsgroup.json
* 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6001.drmrs.wmnet
* 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4033.ulsfo.wmnet
* 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4021.ulsfo.wmnet
* 17:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1042.eqiad.wmnet
* 17:28 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1041.eqiad.wmnet
* 17:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 17:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6001.drmrs.wmnet
* 17:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1043.eqiad.wmnet
* 17:23 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1042.eqiad.wmnet
* 17:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1146.eqiad.wmnet with OS buster
* 17:22 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4021.ulsfo.wmnet
* 17:21 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1044.eqiad.wmnet
* 17:21 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 17:18 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1043.eqiad.wmnet
* 17:17 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1045.eqiad.wmnet
* 17:16 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1044.eqiad.wmnet
* 17:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 17:13 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
* 17:12 mutante: serially rebooting hosts in the wtp104* range
* 17:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 17:09 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1045.eqiad.wmnet
* 17:08 mutante: wtp1046 - rebooting
* 17:06 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1007.eqiad.wmnet
* 17:06 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1007.eqiad.wmnet
* 17:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1007.eqiad.wmnet with OS bullseye
* 17:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
* 17:05 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 16:54 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
* 16:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 16:51 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
* 16:49 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 16:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:48 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
* 16:43 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:43 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 16:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:41 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:38 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 16:36 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1007.eqiad.wmnet with OS bullseye
* 16:35 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:35 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24123 and previous config saved to /var/cache/conftool/dbconfig/20220405-163454-ladsgroup.json
* 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
* 16:32 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Upgrade dbstore1007 to bullseye
* 16:32 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Upgrade dbstore1007 to bullseye
* 16:32 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1005.eqiad.wmnet
* 16:32 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1005.eqiad.wmnet
* 16:19 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1005.eqiad.wmnet with OS bullseye
* 16:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1090.eqiad.wmnet
* 16:08 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:08 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:07 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:07 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1005.eqiad.wmnet with reason: host reimage
* 16:02 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1005.eqiad.wmnet with reason: host reimage
* 16:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1003.eqiad.wmnet with OS bullseye
* 16:01 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1090.eqiad.wmnet
* 15:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1004.eqiad.wmnet with OS bullseye
* 15:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1002.eqiad.wmnet with OS bullseye
* 15:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:52 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1005.eqiad.wmnet with OS bullseye
* 15:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
* 15:49 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Upgrade dbstore1005 to bullseye
* 15:49 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Upgrade dbstore1005 to bullseye
* 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:47 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1003.eqiad.wmnet
* 15:47 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1003.eqiad.wmnet
* 15:46 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
* 15:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2024.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 15:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2024.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
* 15:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
* 15:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
* 15:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1001.eqiad.wmnet with OS bullseye
* 15:42 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.5/includes: Backport: [[gerrit:777388{{!}}ParserOutputAccess: Allow calling getPO with option of not saving in PC (T285993)]] (duration: 01m 00s)
* 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1089.eqiad.wmnet
* 15:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:40 moritzm: drain ganeti2019 [[phab:T305469|T305469]]
* 15:39 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1003.eqiad.wmnet with OS bullseye
* 15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
* 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bullseye
* 15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1088.eqiad.wmnet
* 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bullseye
* 15:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
* 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1002.eqiad.wmnet with OS bullseye
* 15:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
* 15:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4036.ulsfo.wmnet
* 15:26 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 15:26 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 15:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 15:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 15:25 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1003.eqiad.wmnet with reason: host reimage
* 15:23 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1088.eqiad.wmnet
* 15:23 mmandere: pool cp5007 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 15:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 15:20 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1003.eqiad.wmnet with reason: host reimage
* 15:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 15:19 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5007.eqsin.wmnet with OS buster
* 15:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bullseye
* 15:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:12 moritzm: installing atftp security updates
* 15:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2022.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 15:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2022.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3065.esams.wmnet
* 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1087.eqiad.wmnet
* 15:10 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1003.eqiad.wmnet with OS bullseye
* 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2020.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2020.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 15:02 mmandere: pool cp5013 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:01 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Taking host offline to upgrade to Bullseye
* 15:01 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Taking host offline to upgrade to Bullseye
* 15:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5013.eqsin.wmnet with OS buster
* 14:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2013.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 14:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2013.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 14:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3065.esams.wmnet
* 14:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-cache1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1087.eqiad.wmnet
* 14:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-cache1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:50 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4036.ulsfo.wmnet
* 14:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5007.eqsin.wmnet with reason: host reimage
* 14:49 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-cache1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:49 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2012.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2012.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 14:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5007.eqsin.wmnet with reason: host reimage
* 14:44 vgutierrez: re-pool cp1086
* 14:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2011.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 14:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2011.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
* 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
* 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24122 and previous config saved to /var/cache/conftool/dbconfig/20220405-143316-ladsgroup.json
* 14:31 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
* 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster1001.eqiad.wmnet with reason: reimage
* 14:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster1001.eqiad.wmnet with reason: reimage
* 14:31 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2014.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2014.codfw.wmnet with reason: Rebooting for [[phab:T303174|T303174]]
* 14:22 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5007.eqsin.wmnet with OS buster
* 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24121 and previous config saved to /var/cache/conftool/dbconfig/20220405-141811-ladsgroup.json
* 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:12 mmandere: depool cp5007 for reimage - [[phab:T290005|T290005]]
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5013.eqsin.wmnet with OS buster
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:775294{{!}}Enable videojs on all of DIP wikis (T248418)]] (duration: 00m 53s)
* 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24120 and previous config saved to /var/cache/conftool/dbconfig/20220405-140306-ladsgroup.json
* 13:58 mmandere: depool cp5013 for reimage - [[phab:T290005|T290005]]
* 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3004.wikimedia.org
* 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24119 and previous config saved to /var/cache/conftool/dbconfig/20220405-134801-ladsgroup.json
* 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deneb.codfw.wmnet
* 13:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1086.eqiad.wmnet
* 13:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3004.wikimedia.org
* 13:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Cluster re-init for new IP ranges
* 13:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Cluster re-init for new IP ranges
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
* 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deneb.codfw.wmnet
* 13:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1086.eqiad.wmnet
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster2001.codfw.wmnet with reason: reimage
* 13:23 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster2001.codfw.wmnet with reason: reimage
* 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4003.wikimedia.org
* 13:20 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:776257{{!}}Start writing to $wmgUdp2logDest the same value as to $wmfUdp2logDest (T45956)]] (duration: 00m 54s)
* 13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4003.wikimedia.org
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:17 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:776250{{!}}Pin CheckUser actor migration to old schema (T233004)]] (duration: 00m 54s)
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4030.ulsfo.wmnet
* 13:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4030.ulsfo.wmnet
* 13:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3064.esams.wmnet
* 13:03 moritzm: installing openssl updates from buster 10.12 point release
* 13:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
* 12:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
* 12:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:54 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3064.esams.wmnet
* 12:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1085.eqiad.wmnet
* 12:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24117 and previous config saved to /var/cache/conftool/dbconfig/20220405-124745-ladsgroup.json
* 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24116 and previous config saved to /var/cache/conftool/dbconfig/20220405-124732-ladsgroup.json
* 12:46 mmandere: pool cp6007 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 12:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1085.eqiad.wmnet
* 12:40 mmandere: pool cp5015 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24115 and previous config saved to /var/cache/conftool/dbconfig/20220405-123227-ladsgroup.json
* 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:22 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS buster
* 12:18 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24114 and previous config saved to /var/cache/conftool/dbconfig/20220405-121722-ladsgroup.json
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5015.eqsin.wmnet with OS buster
* 11:56 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
* 11:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:52 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.6  refs [[phab:T305212|T305212]]
* 11:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
* 11:48 jnuche@deploy1002: Finished scap: resync wmf.6 to reapply security patches - [[phab:T305212|T305212]] (duration: 02m 50s)
* 11:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
* 11:45 jnuche@deploy1002: Started scap: resync wmf.6 to reapply security patches - [[phab:T305212|T305212]]
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 [[phab:T305427|T305427]]', diff saved to https://phabricator.wikimedia.org/P24112 and previous config saved to /var/cache/conftool/dbconfig/20220405-113944-root.json
* 11:38 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS buster
* 11:31 mmandere: depool cp6007 for reimage - [[phab:T290005|T290005]]
* 11:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:23 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5015.eqsin.wmnet with OS buster
* 11:15 mmandere: depool cp5015 for reimage - [[phab:T290005|T290005]]
* 11:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:10 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 11:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 11:06 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudgw1001.eqiad.wmnet
* 11:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24111 and previous config saved to /var/cache/conftool/dbconfig/20220405-110232-ladsgroup.json
* 11:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 11:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24110 and previous config saved to /var/cache/conftool/dbconfig/20220405-110224-ladsgroup.json
* 11:03 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:56 volans: installer spicerack v2.4.0 on the cumin hosts
* 10:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1001.eqiad.wmnet with OS bullseye
* 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24109 and previous config saved to /var/cache/conftool/dbconfig/20220405-104719-ladsgroup.json
* 10:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
* 10:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
* 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24108 and previous config saved to /var/cache/conftool/dbconfig/20220405-103214-ladsgroup.json
* 10:30 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:30 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
* 10:30 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudgw1001.eqiad.wmnet with OS bullseye
* 10:19 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:18 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24107 and previous config saved to /var/cache/conftool/dbconfig/20220405-101709-ladsgroup.json
* 09:49 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24105 and previous config saved to /var/cache/conftool/dbconfig/20220405-091947-ladsgroup.json
* 09:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24104 and previous config saved to /var/cache/conftool/dbconfig/20220405-091939-ladsgroup.json
* 09:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.6"
* 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24103 and previous config saved to /var/cache/conftool/dbconfig/20220405-090434-ladsgroup.json
* 08:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24102 and previous config saved to /var/cache/conftool/dbconfig/20220405-084928-ladsgroup.json
* 08:49 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
* 08:46 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
* 08:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 08:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:35 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
* 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24101 and previous config saved to /var/cache/conftool/dbconfig/20220405-083423-ladsgroup.json
* 08:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:31 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.6  refs [[phab:T305212|T305212]]
* 08:28 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudgw1001.eqiad.wmnet with OS bullseye
* 08:26 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dragonfly-supernode2001.codfw.wmnet
* 08:23 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
* 08:21 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.6  refs [[phab:T305212|T305212]] (duration: 42m 53s)
* 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
* 08:13 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 08:13 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 08:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 07:52 XioNoX: disable BGP to Tata in drmrs for circuit move - [[phab:T298208|T298208]]
* 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:38 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.6  refs [[phab:T305212|T305212]]
* 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24100 and previous config saved to /var/cache/conftool/dbconfig/20220405-073617-ladsgroup.json
* 07:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 07:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24099 and previous config saved to /var/cache/conftool/dbconfig/20220405-073608-ladsgroup.json
* 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24098 and previous config saved to /var/cache/conftool/dbconfig/20220405-072103-ladsgroup.json
* 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24097 and previous config saved to /var/cache/conftool/dbconfig/20220405-070558-ladsgroup.json
* 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24096 and previous config saved to /var/cache/conftool/dbconfig/20220405-065053-ladsgroup.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1132 [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P24095 and previous config saved to /var/cache/conftool/dbconfig/20220405-063648-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 into API for testing [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P24094 and previous config saved to /var/cache/conftool/dbconfig/20220405-060124-marostegui.json
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 for testing [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P24093 and previous config saved to /var/cache/conftool/dbconfig/20220405-055256-marostegui.json
* 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24092 and previous config saved to /var/cache/conftool/dbconfig/20220405-054610-ladsgroup.json
* 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24091 and previous config saved to /var/cache/conftool/dbconfig/20220405-054602-ladsgroup.json
* 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24090 and previous config saved to /var/cache/conftool/dbconfig/20220405-053057-ladsgroup.json
* 05:17 _joe_: uploading new minor version of conftool to apt for buster/bullseye (requestctl new feature)
* 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24089 and previous config saved to /var/cache/conftool/dbconfig/20220405-051552-ladsgroup.json
* 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24088 and previous config saved to /var/cache/conftool/dbconfig/20220405-050047-ladsgroup.json
* 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 for testing [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P24087 and previous config saved to /var/cache/conftool/dbconfig/20220405-043426-marostegui.json
* 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24086 and previous config saved to /var/cache/conftool/dbconfig/20220405-040309-ladsgroup.json
* 04:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 04:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
* 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24085 and previous config saved to /var/cache/conftool/dbconfig/20220405-040301-ladsgroup.json
* 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24084 and previous config saved to /var/cache/conftool/dbconfig/20220405-034756-ladsgroup.json
* 03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24083 and previous config saved to /var/cache/conftool/dbconfig/20220405-033251-ladsgroup.json
* 03:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24082 and previous config saved to /var/cache/conftool/dbconfig/20220405-031745-ladsgroup.json
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24081 and previous config saved to /var/cache/conftool/dbconfig/20220405-022132-ladsgroup.json
* 02:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 02:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24080 and previous config saved to /var/cache/conftool/dbconfig/20220405-022124-ladsgroup.json
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24079 and previous config saved to /var/cache/conftool/dbconfig/20220405-020619-ladsgroup.json
* 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: [[phab:T305423|T305423]]
* 01:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: [[phab:T305423|T305423]]
* 01:57 eileen: process control config revision changed from {{Gerrit|06379640}} to {{Gerrit|25728a0e}}
* 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24078 and previous config saved to /var/cache/conftool/dbconfig/20220405-015114-ladsgroup.json
* 01:47 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp5002.eqsin.wmnet
* 01:42 eileen: civicrm revision changed from {{Gerrit|84c737b6}} to {{Gerrit|87bc3114}}
* 01:37 eileen: config revision changed from {{Gerrit|bb0e1af3}} to {{Gerrit|06379640}}
* 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P24077 and previous config saved to /var/cache/conftool/dbconfig/20220405-013609-ladsgroup.json
* 01:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet
* 01:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet
* 01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet
* 01:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3063.esams.wmnet
* 00:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4034.ulsfo.wmnet
* 00:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4034.ulsfo.wmnet
* 00:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5016.eqsin.wmnet
* 00:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5016.eqsin.wmnet

Revision as of 01:34, 6 April 2022

2022-04-06

  • 01:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24142 and previous config saved to /var/cache/conftool/dbconfig/20220406-013420-ladsgroup.json
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24141 and previous config saved to /var/cache/conftool/dbconfig/20220406-011915-ladsgroup.json
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24140 and previous config saved to /var/cache/conftool/dbconfig/20220406-010410-ladsgroup.json

2022-04-05

  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24139 and previous config saved to /var/cache/conftool/dbconfig/20220405-233042-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 22:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24138 and previous config saved to /var/cache/conftool/dbconfig/20220405-224352-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24137 and previous config saved to /var/cache/conftool/dbconfig/20220405-222847-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24136 and previous config saved to /var/cache/conftool/dbconfig/20220405-221342-ladsgroup.json
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24135 and previous config saved to /var/cache/conftool/dbconfig/20220405-215837-ladsgroup.json
  • 21:21 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fd8b410] (duration: 06m 48s)
  • 21:14 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fd8b410]
  • 21:14 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410] (thin): Regular analytics weekly train THIN [analytics/refinery@fd8b410] (duration: 00m 10s)
  • 21:14 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410] (thin): Regular analytics weekly train THIN [analytics/refinery@fd8b410]
  • 21:13 razzi@deploy1002: Finished deploy [analytics/refinery@fd8b410]: Regular analytics weekly train [analytics/refinery@fd8b410] (duration: 22m 50s)
  • 21:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6014.drmrs.wmnet
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24133 and previous config saved to /var/cache/conftool/dbconfig/20220405-205822-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 20:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6014.drmrs.wmnet
  • 20:50 razzi@deploy1002: Started deploy [analytics/refinery@fd8b410]: Regular analytics weekly train [analytics/refinery@fd8b410]
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 urbanecm: UTC late B&C window done
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mutante: puppetmaster1001 - running test downloads of geoip databases to a temp dir
  • 20:47 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 8ea8634: Change upload dialog automatic upload comments (T305303) (duration: 00m 54s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:41 razzi: deploying refinery for https://gerrit.wikimedia.org/r/c/analytics/refinery/+/776269/
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6013.drmrs.wmnet
  • 20:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 10c16c5: [config]: Undeploy GDI survey from EN,FR and ES wikis in PROD (T303962) (duration: 00m 55s)
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6013.drmrs.wmnet
  • 20:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6012.drmrs.wmnet
  • 20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24132 and previous config saved to /var/cache/conftool/dbconfig/20220405-201315-ladsgroup.json
  • 20:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6012.drmrs.wmnet
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24131 and previous config saved to /var/cache/conftool/dbconfig/20220405-195810-ladsgroup.json
  • 19:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6011.drmrs.wmnet
  • 19:49 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw(1307|1308|1309|1310|1311|1318|1334|1335|1336|1337).*
  • 19:46 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6011.drmrs.wmnet
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24130 and previous config saved to /var/cache/conftool/dbconfig/20220405-194305-ladsgroup.json
  • 19:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6010.drmrs.wmnet
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6010.drmrs.wmnet
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24129 and previous config saved to /var/cache/conftool/dbconfig/20220405-192800-ladsgroup.json
  • 19:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6009.drmrs.wmnet
  • 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6009.drmrs.wmnet
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6006.drmrs.wmnet
  • 18:47 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6006.drmrs.wmnet
  • 18:42 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 18:41 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 18:37 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 18:34 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.1-1+deb11u1_amd64.changes
  • 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6005.drmrs.wmnet
  • 18:28 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.1-1_amd64.changes # T299705
  • 18:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2015.codfw.wmnet
  • 18:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2016.codfw.wmnet
  • 18:25 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2017.codfw.wmnet
  • 18:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2018.codfw.wmnet
  • 18:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6005.drmrs.wmnet
  • 18:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2019.codfw.wmnet
  • 18:23 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2015.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse2020.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2016.codfw.wmnet
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24128 and previous config saved to /var/cache/conftool/dbconfig/20220405-181712-ladsgroup.json
  • 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24127 and previous config saved to /var/cache/conftool/dbconfig/20220405-181658-ladsgroup.json
  • 18:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6004.drmrs.wmnet
  • 18:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6004.drmrs.wmnet
  • 18:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24126 and previous config saved to /var/cache/conftool/dbconfig/20220405-180153-ladsgroup.json
  • 18:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 17:59 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parse2020.codfw.wmnet
  • 17:59 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 17:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2020.codfw.wmnet
  • 17:58 mutante: rebooting hosts in the parse201* range, starting with parse2019, counting down
  • 17:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6003.drmrs.wmnet
  • 17:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
  • 17:56 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 17:54 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host parse2020.codfw.wmnet
  • 17:53 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 17:52 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse201[7-9].codfw.wmnet
  • 17:51 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse201[7-9].wmnet
  • 17:51 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse2020.wmnet
  • 17:49 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6003.drmrs.wmnet
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4035.ulsfo.wmnet
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6002.drmrs.wmnet
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24125 and previous config saved to /var/cache/conftool/dbconfig/20220405-174648-ladsgroup.json
  • 17:40 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6002.drmrs.wmnet
  • 17:40 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4035.ulsfo.wmnet
  • 17:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4033.ulsfo.wmnet
  • 17:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1040.eqiad.wmnet
  • 17:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1041.eqiad.wmnet
  • 17:32 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1040.eqiad.wmnet
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24124 and previous config saved to /var/cache/conftool/dbconfig/20220405-173143-ladsgroup.json
  • 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6001.drmrs.wmnet
  • 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4033.ulsfo.wmnet
  • 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4021.ulsfo.wmnet
  • 17:28 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1042.eqiad.wmnet
  • 17:28 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1041.eqiad.wmnet
  • 17:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp6001.drmrs.wmnet
  • 17:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1043.eqiad.wmnet
  • 17:23 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1042.eqiad.wmnet
  • 17:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1146.eqiad.wmnet with OS buster
  • 17:22 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4021.ulsfo.wmnet
  • 17:21 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1044.eqiad.wmnet
  • 17:21 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 17:18 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1043.eqiad.wmnet
  • 17:17 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1045.eqiad.wmnet
  • 17:16 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1044.eqiad.wmnet
  • 17:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:13 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
  • 17:12 mutante: serially rebooting hosts in the wtp104* range
  • 17:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:09 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1045.eqiad.wmnet
  • 17:08 mutante: wtp1046 - rebooting
  • 17:06 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1007.eqiad.wmnet
  • 17:06 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1007.eqiad.wmnet
  • 17:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1007.eqiad.wmnet with OS bullseye
  • 17:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
  • 17:05 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 16:54 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
  • 16:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 16:51 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 16:49 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 16:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:48 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 16:43 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:43 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 16:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:41 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:39 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:38 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 16:36 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1007.eqiad.wmnet with OS bullseye
  • 16:35 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:35 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24123 and previous config saved to /var/cache/conftool/dbconfig/20220405-163454-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 16:32 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Upgrade dbstore1007 to bullseye
  • 16:32 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Upgrade dbstore1007 to bullseye
  • 16:32 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1005.eqiad.wmnet
  • 16:32 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1005.eqiad.wmnet
  • 16:19 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1005.eqiad.wmnet with OS bullseye
  • 16:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1090.eqiad.wmnet
  • 16:08 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:08 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:07 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:07 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1005.eqiad.wmnet with reason: host reimage
  • 16:02 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1005.eqiad.wmnet with reason: host reimage
  • 16:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 16:01 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1090.eqiad.wmnet
  • 15:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 15:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 15:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:52 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1005.eqiad.wmnet with OS bullseye
  • 15:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 15:49 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Upgrade dbstore1005 to bullseye
  • 15:49 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Upgrade dbstore1005 to bullseye
  • 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:47 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1003.eqiad.wmnet
  • 15:47 razzi@cumin1001: START - Cookbook sre.hosts.remove-downtime for dbstore1003.eqiad.wmnet
  • 15:46 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 15:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2024.codfw.wmnet with reason: Rebooting for T303174
  • 15:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2024.codfw.wmnet with reason: Rebooting for T303174
  • 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 15:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 15:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 15:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 15:42 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.5/includes: Backport: ParserOutputAccess: Allow calling getPO with option of not saving in PC (T285993) (duration: 01m 00s)
  • 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1089.eqiad.wmnet
  • 15:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:40 moritzm: drain ganeti2019 T305469
  • 15:39 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1003.eqiad.wmnet with OS bullseye
  • 15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
  • 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1088.eqiad.wmnet
  • 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 15:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 15:31 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 15:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 15:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4036.ulsfo.wmnet
  • 15:26 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:26 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:25 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1003.eqiad.wmnet with reason: host reimage
  • 15:23 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1088.eqiad.wmnet
  • 15:23 mmandere: pool cp5007 with HAProxy as TLS termination layer - T290005
  • 15:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet with reason: Rebooting for T303174
  • 15:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet with reason: Rebooting for T303174
  • 15:20 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1003.eqiad.wmnet with reason: host reimage
  • 15:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:19 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5007.eqsin.wmnet with OS buster
  • 15:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 15:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:12 moritzm: installing atftp security updates
  • 15:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2022.codfw.wmnet with reason: Rebooting for T303174
  • 15:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2022.codfw.wmnet with reason: Rebooting for T303174
  • 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3065.esams.wmnet
  • 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1087.eqiad.wmnet
  • 15:10 razzi@cumin1001: START - Cookbook sre.hosts.reimage for host dbstore1003.eqiad.wmnet with OS bullseye
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2020.codfw.wmnet with reason: Rebooting for T303174
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2020.codfw.wmnet with reason: Rebooting for T303174
  • 15:02 mmandere: pool cp5013 with HAProxy as TLS termination layer - T290005
  • 15:01 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Taking host offline to upgrade to Bullseye
  • 15:01 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Taking host offline to upgrade to Bullseye
  • 15:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5013.eqsin.wmnet with OS buster
  • 14:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2013.codfw.wmnet with reason: Rebooting for T303174
  • 14:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2013.codfw.wmnet with reason: Rebooting for T303174
  • 14:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3065.esams.wmnet
  • 14:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-cache1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1087.eqiad.wmnet
  • 14:50 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-cache1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4036.ulsfo.wmnet
  • 14:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5007.eqsin.wmnet with reason: host reimage
  • 14:49 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-cache1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2012.codfw.wmnet with reason: Rebooting for T303174
  • 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2012.codfw.wmnet with reason: Rebooting for T303174
  • 14:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5007.eqsin.wmnet with reason: host reimage
  • 14:44 vgutierrez: re-pool cp1086
  • 14:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2011.codfw.wmnet with reason: Rebooting for T303174
  • 14:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2011.codfw.wmnet with reason: Rebooting for T303174
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24122 and previous config saved to /var/cache/conftool/dbconfig/20220405-143316-ladsgroup.json
  • 14:31 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
  • 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster1001.eqiad.wmnet with reason: reimage
  • 14:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster1001.eqiad.wmnet with reason: reimage
  • 14:31 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
  • 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on pc2014.codfw.wmnet with reason: Rebooting for T303174
  • 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on pc2014.codfw.wmnet with reason: Rebooting for T303174
  • 14:22 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5007.eqsin.wmnet with OS buster
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24121 and previous config saved to /var/cache/conftool/dbconfig/20220405-141811-ladsgroup.json
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:12 mmandere: depool cp5007 for reimage - T290005
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5013.eqsin.wmnet with OS buster
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable videojs on all of DIP wikis (T248418) (duration: 00m 53s)
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24120 and previous config saved to /var/cache/conftool/dbconfig/20220405-140306-ladsgroup.json
  • 13:58 mmandere: depool cp5013 for reimage - T290005
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3004.wikimedia.org
  • 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24119 and previous config saved to /var/cache/conftool/dbconfig/20220405-134801-ladsgroup.json
  • 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deneb.codfw.wmnet
  • 13:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1086.eqiad.wmnet
  • 13:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3004.wikimedia.org
  • 13:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Cluster re-init for new IP ranges
  • 13:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Cluster re-init for new IP ranges
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deneb.codfw.wmnet
  • 13:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1086.eqiad.wmnet
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster2001.codfw.wmnet with reason: reimage
  • 13:23 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster2001.codfw.wmnet with reason: reimage
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4003.wikimedia.org
  • 13:20 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Start writing to $wmgUdp2logDest the same value as to $wmfUdp2logDest (T45956) (duration: 00m 54s)
  • 13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4003.wikimedia.org
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:17 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Pin CheckUser actor migration to old schema (T233004) (duration: 00m 54s)
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4030.ulsfo.wmnet
  • 13:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4030.ulsfo.wmnet
  • 13:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3064.esams.wmnet
  • 13:03 moritzm: installing openssl updates from buster 10.12 point release
  • 13:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
  • 12:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
  • 12:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:54 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3064.esams.wmnet
  • 12:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1085.eqiad.wmnet
  • 12:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24117 and previous config saved to /var/cache/conftool/dbconfig/20220405-124745-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24116 and previous config saved to /var/cache/conftool/dbconfig/20220405-124732-ladsgroup.json
  • 12:46 mmandere: pool cp6007 with HAProxy as TLS termination layer - T290005
  • 12:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1085.eqiad.wmnet
  • 12:40 mmandere: pool cp5015 with HAProxy as TLS termination layer - T290005
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24115 and previous config saved to /var/cache/conftool/dbconfig/20220405-123227-ladsgroup.json
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:22 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS buster
  • 12:18 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24114 and previous config saved to /var/cache/conftool/dbconfig/20220405-121722-ladsgroup.json
  • 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5015.eqsin.wmnet with OS buster
  • 11:56 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
  • 11:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:52 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.6 refs T305212
  • 11:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
  • 11:48 jnuche@deploy1002: Finished scap: resync wmf.6 to reapply security patches - T305212 (duration: 02m 50s)
  • 11:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
  • 11:45 jnuche@deploy1002: Started scap: resync wmf.6 to reapply security patches - T305212
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 T305427', diff saved to https://phabricator.wikimedia.org/P24112 and previous config saved to /var/cache/conftool/dbconfig/20220405-113944-root.json
  • 11:38 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS buster
  • 11:31 mmandere: depool cp6007 for reimage - T290005
  • 11:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:23 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5015.eqsin.wmnet with OS buster
  • 11:15 mmandere: depool cp5015 for reimage - T290005
  • 11:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:10 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 11:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 11:06 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudgw1001.eqiad.wmnet
  • 11:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24111 and previous config saved to /var/cache/conftool/dbconfig/20220405-110232-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24110 and previous config saved to /var/cache/conftool/dbconfig/20220405-110224-ladsgroup.json
  • 11:03 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:56 volans: installer spicerack v2.4.0 on the cumin hosts
  • 10:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24109 and previous config saved to /var/cache/conftool/dbconfig/20220405-104719-ladsgroup.json
  • 10:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 10:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24108 and previous config saved to /var/cache/conftool/dbconfig/20220405-103214-ladsgroup.json
  • 10:30 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:30 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 10:30 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 10:19 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:18 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24107 and previous config saved to /var/cache/conftool/dbconfig/20220405-101709-ladsgroup.json
  • 09:49 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24105 and previous config saved to /var/cache/conftool/dbconfig/20220405-091947-ladsgroup.json
  • 09:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24104 and previous config saved to /var/cache/conftool/dbconfig/20220405-091939-ladsgroup.json
  • 09:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.39.0-wmf.6"
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24103 and previous config saved to /var/cache/conftool/dbconfig/20220405-090434-ladsgroup.json
  • 08:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24102 and previous config saved to /var/cache/conftool/dbconfig/20220405-084928-ladsgroup.json
  • 08:49 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:46 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 08:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:35 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24101 and previous config saved to /var/cache/conftool/dbconfig/20220405-083423-ladsgroup.json
  • 08:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:31 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.6 refs T305212
  • 08:28 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:26 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dragonfly-supernode2001.codfw.wmnet
  • 08:23 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bullseye
  • 08:21 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.6 refs T305212 (duration: 42m 53s)
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
  • 08:13 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:13 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:52 XioNoX: disable BGP to Tata in drmrs for circuit move - T298208
  • 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:38 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.6 refs T305212
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24100 and previous config saved to /var/cache/conftool/dbconfig/20220405-073617-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24099 and previous config saved to /var/cache/conftool/dbconfig/20220405-073608-ladsgroup.json
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24098 and previous config saved to /var/cache/conftool/dbconfig/20220405-072103-ladsgroup.json
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24097 and previous config saved to /var/cache/conftool/dbconfig/20220405-070558-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24096 and previous config saved to /var/cache/conftool/dbconfig/20220405-065053-ladsgroup.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1132 T301879', diff saved to https://phabricator.wikimedia.org/P24095 and previous config saved to /var/cache/conftool/dbconfig/20220405-063648-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 into API for testing T301879', diff saved to https://phabricator.wikimedia.org/P24094 and previous config saved to /var/cache/conftool/dbconfig/20220405-060124-marostegui.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 for testing T301879', diff saved to https://phabricator.wikimedia.org/P24093 and previous config saved to /var/cache/conftool/dbconfig/20220405-055256-marostegui.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24092 and previous config saved to /var/cache/conftool/dbconfig/20220405-054610-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24091 and previous config saved to /var/cache/conftool/dbconfig/20220405-054602-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24090 and previous config saved to /var/cache/conftool/dbconfig/20220405-053057-ladsgroup.json
  • 05:17 _joe_: uploading new minor version of conftool to apt for buster/bullseye (requestctl new feature)
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24089 and previous config saved to /var/cache/conftool/dbconfig/20220405-051552-ladsgroup.json
  • 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24088 and previous config saved to /var/cache/conftool/dbconfig/20220405-050047-ladsgroup.json
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1132 for testing T301879', diff saved to https://phabricator.wikimedia.org/P24087 and previous config saved to /var/cache/conftool/dbconfig/20220405-043426-marostegui.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24086 and previous config saved to /var/cache/conftool/dbconfig/20220405-040309-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 04:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24085 and previous config saved to /var/cache/conftool/dbconfig/20220405-040301-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24084 and previous config saved to /var/cache/conftool/dbconfig/20220405-034756-ladsgroup.json
  • 03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24083 and previous config saved to /var/cache/conftool/dbconfig/20220405-033251-ladsgroup.json
  • 03:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24082 and previous config saved to /var/cache/conftool/dbconfig/20220405-031745-ladsgroup.json
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24081 and previous config saved to /var/cache/conftool/dbconfig/20220405-022132-ladsgroup.json
  • 02:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24080 and previous config saved to /var/cache/conftool/dbconfig/20220405-022124-ladsgroup.json
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24079 and previous config saved to /var/cache/conftool/dbconfig/20220405-020619-ladsgroup.json
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: T305423
  • 01:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5002.eqsin.wmnet with reason: downtimed because of hardware failure: T305423
  • 01:57 eileen: process control config revision changed from 06379640 to 25728a0e
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24078 and previous config saved to /var/cache/conftool/dbconfig/20220405-015114-ladsgroup.json
  • 01:47 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp5002.eqsin.wmnet
  • 01:42 eileen: civicrm revision changed from 84c737b6 to 87bc3114
  • 01:37 eileen: config revision changed from bb0e1af3 to 06379640
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24077 and previous config saved to /var/cache/conftool/dbconfig/20220405-013609-ladsgroup.json
  • 01:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet
  • 01:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet
  • 01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet
  • 01:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3063.esams.wmnet
  • 00:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4034.ulsfo.wmnet
  • 00:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5016.eqsin.wmnet
  • 00:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3063.esams.wmnet
  • 00:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1084.eqiad.wmnet
  • 00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4034.ulsfo.wmnet
  • 00:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2042.codfw.wmnet
  • 00:43 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5016.eqsin.wmnet
  • 00:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1084.eqiad.wmnet
  • 00:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2042.codfw.wmnet
  • 00:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4032.ulsfo.wmnet
  • 00:39 mutante: gitlab1001 - mv 1648814678_2022_04_01_14.9.1_gitlab_backup.tar and other files from April 2nd/April 3rd over from /srv/gitlab-backup to /mnt/gitlab-backup to prevent another outage due to disk space T274463
  • 00:36 mutante: gitlab2001 - apt-get clean to prevent disk space issues
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24076 and previous config saved to /var/cache/conftool/dbconfig/20220405-003419-ladsgroup.json
  • 00:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24075 and previous config saved to /var/cache/conftool/dbconfig/20220405-003405-ladsgroup.json
  • 00:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4032.ulsfo.wmnet
  • 00:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
  • 00:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1047.eqiad.wmnet
  • 00:32 mutante: gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover... T274463 - <+icinga-wm> RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK
  • 00:30 mutante: gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover...
  • 00:27 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1048.eqiad.wmnet
  • 00:23 mutante: wtp1046, wtp1047, wtp1048 - rebooting, one at a time
  • 00:21 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp104[6-8].eqiad.wmnet
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24074 and previous config saved to /var/cache/conftool/dbconfig/20220405-001900-ladsgroup.json
  • 00:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5012.eqsin.wmnet
  • 00:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3062.esams.wmnet
  • 00:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1083.eqiad.wmnet
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24073 and previous config saved to /var/cache/conftool/dbconfig/20220405-000355-ladsgroup.json

2022-04-04

  • 23:51 mutante: apt1001 - importing gitlab-runner package for bullseye via: 'sudo -E reprepro --noskipold --component thirdparty/gitlab-runner update bullseye-wikimedia' after gerrit:767604 (T297659)
  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24072 and previous config saved to /var/cache/conftool/dbconfig/20220404-234850-ladsgroup.json
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24071 and previous config saved to /var/cache/conftool/dbconfig/20220404-224836-ladsgroup.json
  • 22:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 22:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24070 and previous config saved to /var/cache/conftool/dbconfig/20220404-224828-ladsgroup.json
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24069 and previous config saved to /var/cache/conftool/dbconfig/20220404-223323-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24068 and previous config saved to /var/cache/conftool/dbconfig/20220404-221818-ladsgroup.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24067 and previous config saved to /var/cache/conftool/dbconfig/20220404-220313-ladsgroup.json
  • 21:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet
  • 21:14 mutante: puppetmaster1001/puppetmaster2003 - geoip / maxmind database update timers renamed. 'geoip_update_legacy' became 'geoip_update_main', 'geoip_update' became 'geoip_update_ipinfo'. Not using the confusing 'legacy' term anymore as was suggested as part of (T303464)
  • 21:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5011.eqsin.wmnet
  • 21:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2041.codfw.wmnet
  • 21:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet
  • 21:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5011.eqsin.wmnet
  • 21:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2041.codfw.wmnet
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24066 and previous config saved to /var/cache/conftool/dbconfig/20220404-205932-ladsgroup.json
  • 20:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 20:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24065 and previous config saved to /var/cache/conftool/dbconfig/20220404-205924-ladsgroup.json
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24064 and previous config saved to /var/cache/conftool/dbconfig/20220404-204419-ladsgroup.json
  • 20:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1081.eqiad.wmnet
  • 20:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5010.eqsin.wmnet
  • 20:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3061.esams.wmnet
  • 20:32 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1081.eqiad.wmnet
  • 20:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5010.eqsin.wmnet
  • 20:30 urbanecm: UTC late B&C window completed
  • 20:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3061.esams.wmnet
  • 20:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8c81de9: Remove wgWMEIPAddressCopyActionEnabled from Beta and production config (T296469) (duration: 00m 51s)
  • 20:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24063 and previous config saved to /var/cache/conftool/dbconfig/20220404-202914-ladsgroup.json
  • 20:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5006.eqsin.wmnet
  • 20:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1080.eqiad.wmnet
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:16 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5006.eqsin.wmnet
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4027.ulsfo.wmnet
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24062 and previous config saved to /var/cache/conftool/dbconfig/20220404-201409-ladsgroup.json
  • 20:11 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1080.eqiad.wmnet
  • 20:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3060.esams.wmnet
  • 20:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4027.ulsfo.wmnet
  • 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
  • 20:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp3060.esams.wmnet
  • 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
  • 19:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5005.eqsin.wmnet
  • 19:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
  • 19:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2040.codfw.wmnet
  • 19:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
  • 19:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2040.codfw.wmnet
  • 19:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1079.eqiad.wmnet
  • 19:38 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
  • 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1002.eqiad.wmnet
  • 19:35 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1002.eqiad.wmnet
  • 19:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2002.codfw.wmnet
  • 19:33 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon2002.codfw.wmnet
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1079.eqiad.wmnet
  • 19:22 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1001.eqiad.wmnet
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24061 and previous config saved to /var/cache/conftool/dbconfig/20220404-191750-ladsgroup.json
  • 19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24060 and previous config saved to /var/cache/conftool/dbconfig/20220404-191743-ladsgroup.json
  • 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-tls
  • 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-be
  • 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=varnish-fe
  • 19:16 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1001.eqiad.wmnet
  • 19:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4026.ulsfo.wmnet
  • 19:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3059.esams.wmnet
  • 19:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp5005.eqsin.wmnet
  • 19:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4026.ulsfo.wmnet
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24059 and previous config saved to /var/cache/conftool/dbconfig/20220404-190238-ladsgroup.json
  • 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3059.esams.wmnet
  • 18:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2039.codfw.wmnet
  • 18:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1078.eqiad.wmnet
  • 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
  • 18:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2039.codfw.wmnet
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24058 and previous config saved to /var/cache/conftool/dbconfig/20220404-184733-ladsgroup.json
  • 18:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3058.esams.wmnet
  • 18:46 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1078.eqiad.wmnet
  • 18:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4025.ulsfo.wmnet
  • 18:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5004.eqsin.wmnet
  • 18:39 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4025.ulsfo.wmnet
  • 18:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage2001.codfw.wmnet
  • 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3058.esams.wmnet
  • 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5004.eqsin.wmnet
  • 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1077.eqiad.wmnet
  • 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2038.codfw.wmnet
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24057 and previous config saved to /var/cache/conftool/dbconfig/20220404-183227-ladsgroup.json
  • 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4024.ulsfo.wmnet
  • 18:26 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2038.codfw.wmnet
  • 18:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage2001.codfw.wmnet
  • 18:25 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1077.eqiad.wmnet
  • 18:25 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4024.ulsfo.wmnet
  • 18:25 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage1001.eqiad.wmnet
  • 18:08 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
  • 17:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5001.eqsin.wmnet
  • 17:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 17:27 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24056 and previous config saved to /var/cache/conftool/dbconfig/20220404-172707-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24055 and previous config saved to /var/cache/conftool/dbconfig/20220404-172659-ladsgroup.json
  • 17:25 XioNoX: push urpf DHCP exception to all core routers with urpf configured - T285461
  • 17:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5001.eqsin.wmnet
  • 17:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2037.codfw.wmnet
  • 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2037.codfw.wmnet
  • 17:16 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1076.eqiad.wmnet
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24054 and previous config saved to /var/cache/conftool/dbconfig/20220404-171154-ladsgroup.json
  • 17:11 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:10 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1076.eqiad.wmnet
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24053 and previous config saved to /var/cache/conftool/dbconfig/20220404-165649-ladsgroup.json
  • 16:50 taavi: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Brand" "Brand/Archive" "Majavah" --reason 'phab:T305387' # T305387
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24052 and previous config saved to /var/cache/conftool/dbconfig/20220404-164144-ladsgroup.json
  • 16:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 16:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 16:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 16:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 16:09 volans: uploaded spicerack_2.4.0 to apt.wikimedia.org bullseye-wikimedia
  • 16:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:08 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 16:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:02 bblack@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
  • 16:00 bblack@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
  • 15:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 15:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 15:44 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24051 and previous config saved to /var/cache/conftool/dbconfig/20220404-153846-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24050 and previous config saved to /var/cache/conftool/dbconfig/20220404-153839-ladsgroup.json
  • 15:28 moritzm: remove stray debmonitor-server/cumin installs (cleanup of 548425b)
  • 15:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases1002.eqiad.wmnet
  • 15:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24049 and previous config saved to /var/cache/conftool/dbconfig/20220404-152333-ladsgroup.json
  • 15:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host releases1002.eqiad.wmnet
  • 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Use "unexpectedUnconnectedPage" page prop on Beta (production no-op) (duration: 00m 50s)
  • 15:17 mmandere: pool cp6015 with HAProxy as TLS termination layer - T290005
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24048 and previous config saved to /var/cache/conftool/dbconfig/20220404-150828-ladsgroup.json
  • 15:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
  • 15:05 mmandere: pool cp5008 with HAProxy as TLS termination layer - T290005
  • 15:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5008.eqsin.wmnet with OS buster
  • 14:55 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host alert1001.wikimedia.org
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24047 and previous config saved to /var/cache/conftool/dbconfig/20220404-145323-ladsgroup.json
  • 14:44 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 14:44 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org
  • 14:42 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 14:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 14:37 herron: rebooting alert2001
  • 14:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
  • 14:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases2002.codfw.wmnet
  • 14:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host releases2002.codfw.wmnet
  • 14:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
  • 14:16 mmandere: depool cp6015 for reimage - T290005
  • 14:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5008.eqsin.wmnet with OS buster
  • 14:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 13:58 mmandere: depool cp5008 for reimage - T290005
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24045 and previous config saved to /var/cache/conftool/dbconfig/20220404-135314-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24044 and previous config saved to /var/cache/conftool/dbconfig/20220404-135307-ladsgroup.json
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
  • 13:44 mmandere: pool cp3055 with HAProxy as TLS termination layer - T290005
  • 13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24043 and previous config saved to /var/cache/conftool/dbconfig/20220404-133801-ladsgroup.json
  • 13:35 mmandere: pool cp4022 with HAProxy as TLS termination layer - T290005
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5001.wikimedia.org
  • 13:34 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3055.esams.wmnet with OS buster
  • 13:31 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4022.ulsfo.wmnet with OS buster
  • 13:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5001.wikimedia.org
  • 13:26 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24042 and previous config saved to /var/cache/conftool/dbconfig/20220404-132256-ladsgroup.json
  • 13:20 urbanecm: UTC afternoon B&C window done
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 daniel@deploy1002: Synchronized multiversion/defines.php: Config: Always set MW_USE_CONFIG_SCHEMA. (T305176) (duration: 00m 50s)
  • 13:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24041 and previous config saved to /var/cache/conftool/dbconfig/20220404-130751-ladsgroup.json
  • 13:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:05 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 13:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7ebad8f: Add logo variants for zhwiki (T273578) (duration: 00m 51s)
  • 13:04 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
  • 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
  • 12:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
  • 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
  • 12:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 12:48 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4022.ulsfo.wmnet with OS buster
  • 12:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
  • 12:43 moritzm: installing gmp security updates
  • 12:42 mmandere: depool cp4022 for reimage - T290005
  • 12:38 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS buster
  • 12:35 ottomata: removing retention.ms override from eventstreams publicly exposed topics in kafka main-eqiad and main-codfw - T241178
  • 12:31 mmandere: depool cp3055 for reimage - T290005
  • 12:31 ottomata: deleting empty typo topics from kafka main-eqiad: eqiad.mediawiki.page-edit (found while working on T241178)
  • 12:26 ottomata: deleting empty typo topics from kafka main-codfw: codfw.mediawiki.page_delete, codfw.mediawiki.page_move, codfw.mediawiki.page_restore, codfw.mediawiki.revision_create, codfw.mediawiki.revision_visibility_set, codfw.mediawiki.user_block (found while working on T241178)
  • 12:18 moritzm: installing expat updates (followups to earlier security fixes, no security impact by itself)
  • 12:11 mmandere: pool cp4028 with HAProxy as TLS termination layer - T290005
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24040 and previous config saved to /var/cache/conftool/dbconfig/20220404-121030-ladsgroup.json
  • 12:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24039 and previous config saved to /var/cache/conftool/dbconfig/20220404-121022-ladsgroup.json
  • 12:05 mmandere: pool cp3054 with HAProxy as TLS termination layer - T290005
  • 12:04 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4028.ulsfo.wmnet with OS buster
  • 12:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3054.esams.wmnet with OS buster
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24038 and previous config saved to /var/cache/conftool/dbconfig/20220404-115516-ladsgroup.json
  • 11:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24037 and previous config saved to /var/cache/conftool/dbconfig/20220404-114011-ladsgroup.json
  • 11:39 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:37 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
  • 11:37 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
  • 11:34 moritzm: installing zziplib security updates
  • 11:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
  • 11:27 moritzm: installing jbig2dec security updates
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24036 and previous config saved to /var/cache/conftool/dbconfig/20220404-112506-ladsgroup.json
  • 11:20 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4028.ulsfo.wmnet with OS buster
  • 11:18 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:12 mmandere: depool cp4028 for reimage - T290005
  • 11:11 volans: deploying python3-wmflib 1.2.0 fleet-wide
  • 11:09 jforrester@deploy1002: Finished deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block (duration: 00m 08s)
  • 11:09 jforrester@deploy1002: Started deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block
  • 11:07 moritzm: installing cups security updates on buster (client side tools/libs)
  • 11:04 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS buster
  • 10:53 mmandere: depool cp3054 for reimage - T290005
  • 10:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1003.eqiad.wmnet
  • 10:38 volans: uploaded python3-wmflib_1.2.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 10:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1003.eqiad.wmnet
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24035 and previous config saved to /var/cache/conftool/dbconfig/20220404-102616-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24034 and previous config saved to /var/cache/conftool/dbconfig/20220404-102609-ladsgroup.json
  • 10:26 moritzm: installing libxml2 security updates
  • 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1004.eqiad.wmnet
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24033 and previous config saved to /var/cache/conftool/dbconfig/20220404-101104-ladsgroup.json
  • 10:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1004.eqiad.wmnet
  • 10:08 moritzm: installing icu bugfix updates from buster 10.12 point release
  • 09:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1005.eqiad.wmnet
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24032 and previous config saved to /var/cache/conftool/dbconfig/20220404-095558-ladsgroup.json
  • 09:55 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
  • 09:54 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1005.eqiad.wmnet
  • 09:51 mmandere: pool cp6008 with HAProxy as TLS termination layer - T290005
  • 09:48 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
  • 09:47 moritzm: installing zlib security updates
  • 09:44 mmandere: pool cp5003 with HAProxy as TLS termination layer - T290005
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24031 and previous config saved to /var/cache/conftool/dbconfig/20220404-094053-ladsgroup.json
  • 09:31 moritzm: rolling restart of FPM/Apache on mw canaries to pick up updated zlib/glibc/openssl/libxml
  • 09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 09:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 09:26 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS buster
  • 09:26 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 09:25 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5003.eqsin.wmnet with OS buster
  • 09:16 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 09:12 moritzm: installing openssl updates from Buster 10.12 point release
  • 09:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 08:59 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 08:59 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
  • 08:56 moritzm: installing glibc updates from buster 10.12 point release
  • 08:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P24030 and previous config saved to /var/cache/conftool/dbconfig/20220404-084523-root.json
  • 08:43 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:42 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
  • 08:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:37 moritzm: installing flac security updates
  • 08:37 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:37 mmandere: depool cp6008 for reimage - T290005
  • 08:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:31 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:31 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24029 and previous config saved to /var/cache/conftool/dbconfig/20220404-083031-ladsgroup.json
  • 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:28 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5003.eqsin.wmnet with OS buster
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:25 urbanecm@deploy1002: Synchronized logos/config.yaml: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (3/3) (duration: 00m 50s)
  • 08:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (2/3) (duration: 00m 50s)
  • 08:23 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (1/3) (duration: 00m 51s)
  • 08:19 mmandere: depool cp5003 for reimage - T290005
  • 08:02 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 14s)
  • 08:01 jayme@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 07:54 jayme: imported scap 4.6.0 to stretch-/buster-/bullseye-wikimedia - T305250
  • 07:44 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:43 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 07:43 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 07:43 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 07:42 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:39 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:23 taavi: UTC morning deployments done
  • 07:21 taavi@deploy1002: Synchronized wmf-config/throttle.php: Config: throttle: removed expired rule (T304836) (duration: 00m 49s)
  • 07:19 taavi@deploy1002: Synchronized static/images/mobile/copyright/: Config: Revert "fawiki: Set celebration logo for new vector" (T304314) (duration: 00m 49s)
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "fawiki: Set celebration logo for new vector" (T304314) (duration: 00m 50s)
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:15 taavi@deploy1002: Synchronized static/images/project-logos: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 50s)
  • 07:14 taavi@deploy1002: Synchronized logos/config.yaml: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 50s)
  • 07:13 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 51s)
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content and Section Translation for Persian Wikipedia (T296475) (duration: 00m 51s)
  • 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24027 and previous config saved to /var/cache/conftool/dbconfig/20220404-060542-ladsgroup.json
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24026 and previous config saved to /var/cache/conftool/dbconfig/20220404-055037-ladsgroup.json
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1130.eqiad.wmnet with OS bullseye
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24025 and previous config saved to /var/cache/conftool/dbconfig/20220404-053531-ladsgroup.json
  • 05:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
  • 05:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24024 and previous config saved to /var/cache/conftool/dbconfig/20220404-052026-ladsgroup.json
  • 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1130.eqiad.wmnet with OS bullseye
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24023 and previous config saved to /var/cache/conftool/dbconfig/20220404-041545-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 03:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 03:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 02:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 02:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance

2022-04-02

  • 11:26 akosiaris: disable zotero paging until T291707 is resolved.
  • 11:11 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 11:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync

2022-04-01

  • 23:25 mutante: DNS - new project language 'kcg'. 'Tyap is a regionally important dialect cluster of Plateau languages in Nigeria's Middle Belt, named after its prestige dialect. It is also known by its Hausa exonym as Katab or Kataf.' T305279
  • 23:08 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 23:08 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 22:04 bblack: esams re-pooled - T304089
  • 20:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:19 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:48 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
  • 19:47 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
  • 19:44 mutante: rebooting parsoid canary appservers - wtp1025, wtp1026, parse2001, parse2002
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].eqiad.wmnet
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=parse200[1-2].eqiad.wmnet
  • 19:37 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
  • 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
  • 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1450.eqiad.wmnet
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=varnish-fe
  • 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-tls
  • 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-be
  • 19:16 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
  • 19:16 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
  • 19:01 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
  • 19:00 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp2036.codfw.wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1414.wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw1414.wmnet
  • 18:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw141[4-8].wmnet
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2036.codfw.wmnet
  • 13:05 dcausse: reseting jvmquake flag on all wdqs hosts
  • 12:52 dcausse: restarting blazegraph on wdqs1006 and resetting jvmquake warning flag
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 11:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
  • 10:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 10:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 10:47 vgutierrez: reboot acme-chief instances to catch up on kernel upgrades
  • 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6002.drmrs.wmnet
  • 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6002.drmrs.wmnet
  • 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6001.drmrs.wmnet
  • 10:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6001.drmrs.wmnet
  • 10:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5002.eqsin.wmnet
  • 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5002.eqsin.wmnet
  • 10:06 vgutierrez: vgutierrez@puppetmaster2001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
  • 10:04 vgutierrez: vgutierrez@puppetmaster1001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
  • 10:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5001.eqsin.wmnet
  • 09:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5001.eqsin.wmnet
  • 09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4002.ulsfo.wmnet
  • 09:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4002.ulsfo.wmnet
  • 09:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4001.ulsfo.wmnet
  • 09:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4001.ulsfo.wmnet
  • 09:35 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir3002.esams.wmnet
  • 09:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3002.esams.wmnet
  • 09:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3001.esams.wmnet
  • 09:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3001.esams.wmnet
  • 09:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet
  • 09:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet
  • 09:10 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir2001.codfw.wmnet
  • 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2001.codfw.wmnet
  • 08:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1002.eqiad.wmnet
  • 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1002.eqiad.wmnet
  • 08:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1001.eqiad.wmnet
  • 08:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
  • 08:48 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ncredir1001.eqiad.wmnet
  • 08:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
  • 08:44 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 08:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:42 vgutierrez: rolling restart of ncredir instances to catch up on kernel upgrades
  • 06:54 XioNoX: traffic engineering in drmrs to prevent link saturation

Archives

See Server Admin Log/Archives.