You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
Server Admin Log
Jump to navigation
Jump to search
2022-04-05
- 00:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4034.ulsfo.wmnet
- 00:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5016.eqsin.wmnet
- 00:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3063.esams.wmnet
- 00:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1084.eqiad.wmnet
- 00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4034.ulsfo.wmnet
- 00:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2042.codfw.wmnet
- 00:43 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5016.eqsin.wmnet
- 00:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1084.eqiad.wmnet
- 00:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2042.codfw.wmnet
- 00:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4032.ulsfo.wmnet
- 00:39 mutante: gitlab1001 - mv 1648814678_2022_04_01_14.9.1_gitlab_backup.tar and other files from April 2nd/April 3rd over from /srv/gitlab-backup to /mnt/gitlab-backup to prevent another outage due to disk space T274463
- 00:36 mutante: gitlab2001 - apt-get clean to prevent disk space issues
- 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24076 and previous config saved to /var/cache/conftool/dbconfig/20220405-003419-ladsgroup.json
- 00:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
- 00:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
- 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24075 and previous config saved to /var/cache/conftool/dbconfig/20220405-003405-ladsgroup.json
- 00:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4032.ulsfo.wmnet
- 00:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1046.eqiad.wmnet
- 00:33 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1047.eqiad.wmnet
- 00:32 mutante: gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover... T274463 - <+icinga-wm> RECOVERY - Gitlab HTTPS healthcheck on gitlab.wikimedia.org is OK
- 00:30 mutante: gitlab.wikimedia.org was down because gitlab1001 ran out of disk space. ran 'apt-get clean' to free 13G which made it recover...
- 00:27 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp1048.eqiad.wmnet
- 00:23 mutante: wtp1046, wtp1047, wtp1048 - rebooting, one at a time
- 00:21 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp104[6-8].eqiad.wmnet
- 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24074 and previous config saved to /var/cache/conftool/dbconfig/20220405-001900-ladsgroup.json
- 00:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5012.eqsin.wmnet
- 00:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3062.esams.wmnet
- 00:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1083.eqiad.wmnet
- 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P24073 and previous config saved to /var/cache/conftool/dbconfig/20220405-000355-ladsgroup.json
2022-04-04
- 23:51 mutante: apt1001 - importing gitlab-runner package for bullseye via: 'sudo -E reprepro --noskipold --component thirdparty/gitlab-runner update bullseye-wikimedia' after gerrit:767604 (T297659)
- 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24072 and previous config saved to /var/cache/conftool/dbconfig/20220404-234850-ladsgroup.json
- 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298565)', diff saved to https://phabricator.wikimedia.org/P24071 and previous config saved to /var/cache/conftool/dbconfig/20220404-224836-ladsgroup.json
- 22:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 22:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24070 and previous config saved to /var/cache/conftool/dbconfig/20220404-224828-ladsgroup.json
- 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24069 and previous config saved to /var/cache/conftool/dbconfig/20220404-223323-ladsgroup.json
- 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P24068 and previous config saved to /var/cache/conftool/dbconfig/20220404-221818-ladsgroup.json
- 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24067 and previous config saved to /var/cache/conftool/dbconfig/20220404-220313-ladsgroup.json
- 21:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet
- 21:14 mutante: puppetmaster1001/puppetmaster2003 - geoip / maxmind database update timers renamed. 'geoip_update_legacy' became 'geoip_update_main', 'geoip_update' became 'geoip_update_ipinfo'. Not using the confusing 'legacy' term anymore as was suggested as part of (T303464)
- 21:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5011.eqsin.wmnet
- 21:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2041.codfw.wmnet
- 21:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet
- 21:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5011.eqsin.wmnet
- 21:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2041.codfw.wmnet
- 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24066 and previous config saved to /var/cache/conftool/dbconfig/20220404-205932-ladsgroup.json
- 20:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
- 20:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
- 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24065 and previous config saved to /var/cache/conftool/dbconfig/20220404-205924-ladsgroup.json
- 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24064 and previous config saved to /var/cache/conftool/dbconfig/20220404-204419-ladsgroup.json
- 20:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1081.eqiad.wmnet
- 20:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5010.eqsin.wmnet
- 20:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3061.esams.wmnet
- 20:32 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1081.eqiad.wmnet
- 20:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5010.eqsin.wmnet
- 20:30 urbanecm: UTC late B&C window completed
- 20:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3061.esams.wmnet
- 20:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8c81de9: Remove wgWMEIPAddressCopyActionEnabled from Beta and production config (T296469) (duration: 00m 51s)
- 20:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P24063 and previous config saved to /var/cache/conftool/dbconfig/20220404-202914-ladsgroup.json
- 20:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5006.eqsin.wmnet
- 20:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1080.eqiad.wmnet
- 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
- 20:16 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5006.eqsin.wmnet
- 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
- 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
- 20:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4027.ulsfo.wmnet
- 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
- 20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24062 and previous config saved to /var/cache/conftool/dbconfig/20220404-201409-ladsgroup.json
- 20:11 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1080.eqiad.wmnet
- 20:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3060.esams.wmnet
- 20:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4027.ulsfo.wmnet
- 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
- 20:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp3060.esams.wmnet
- 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
- 19:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5005.eqsin.wmnet
- 19:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
- 19:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2040.codfw.wmnet
- 19:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
- 19:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2040.codfw.wmnet
- 19:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1079.eqiad.wmnet
- 19:38 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
- 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1002.eqiad.wmnet
- 19:35 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1002.eqiad.wmnet
- 19:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2002.codfw.wmnet
- 19:33 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon2002.codfw.wmnet
- 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1079.eqiad.wmnet
- 19:22 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1001.eqiad.wmnet
- 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298565)', diff saved to https://phabricator.wikimedia.org/P24061 and previous config saved to /var/cache/conftool/dbconfig/20220404-191750-ladsgroup.json
- 19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24060 and previous config saved to /var/cache/conftool/dbconfig/20220404-191743-ladsgroup.json
- 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-tls
- 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=ats-be
- 19:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5005.eqsin.wmnet,service=varnish-fe
- 19:16 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1001.eqiad.wmnet
- 19:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4026.ulsfo.wmnet
- 19:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3059.esams.wmnet
- 19:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp5005.eqsin.wmnet
- 19:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4026.ulsfo.wmnet
- 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24059 and previous config saved to /var/cache/conftool/dbconfig/20220404-190238-ladsgroup.json
- 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3059.esams.wmnet
- 18:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2039.codfw.wmnet
- 18:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1078.eqiad.wmnet
- 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
- 18:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2039.codfw.wmnet
- 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P24058 and previous config saved to /var/cache/conftool/dbconfig/20220404-184733-ladsgroup.json
- 18:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3058.esams.wmnet
- 18:46 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1078.eqiad.wmnet
- 18:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4025.ulsfo.wmnet
- 18:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5004.eqsin.wmnet
- 18:39 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4025.ulsfo.wmnet
- 18:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage2001.codfw.wmnet
- 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3058.esams.wmnet
- 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5004.eqsin.wmnet
- 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1077.eqiad.wmnet
- 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2038.codfw.wmnet
- 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24057 and previous config saved to /var/cache/conftool/dbconfig/20220404-183227-ladsgroup.json
- 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4024.ulsfo.wmnet
- 18:26 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2038.codfw.wmnet
- 18:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage2001.codfw.wmnet
- 18:25 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1077.eqiad.wmnet
- 18:25 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4024.ulsfo.wmnet
- 18:25 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage1001.eqiad.wmnet
- 18:08 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
- 17:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
- 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5001.eqsin.wmnet
- 17:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
- 17:27 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
- 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298565)', diff saved to https://phabricator.wikimedia.org/P24056 and previous config saved to /var/cache/conftool/dbconfig/20220404-172707-ladsgroup.json
- 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
- 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
- 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24055 and previous config saved to /var/cache/conftool/dbconfig/20220404-172659-ladsgroup.json
- 17:25 XioNoX: push urpf DHCP exception to all core routers with urpf configured - T285461
- 17:24 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5001.eqsin.wmnet
- 17:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2037.codfw.wmnet
- 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2037.codfw.wmnet
- 17:16 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
- 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1076.eqiad.wmnet
- 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24054 and previous config saved to /var/cache/conftool/dbconfig/20220404-171154-ladsgroup.json
- 17:11 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 17:10 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 17:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 17:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1076.eqiad.wmnet
- 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P24053 and previous config saved to /var/cache/conftool/dbconfig/20220404-165649-ladsgroup.json
- 16:50 taavi: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Brand" "Brand/Archive" "Majavah" --reason 'phab:T305387' # T305387
- 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24052 and previous config saved to /var/cache/conftool/dbconfig/20220404-164144-ladsgroup.json
- 16:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
- 16:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
- 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
- 16:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
- 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
- 16:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
- 16:09 volans: uploaded spicerack_2.4.0 to apt.wikimedia.org bullseye-wikimedia
- 16:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
- 16:08 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
- 16:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
- 16:02 bblack@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
- 16:00 bblack@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 1 hosts matching query P{cp2027.codfw.wmnet}
- 15:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
- 15:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
- 15:44 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
- 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298565)', diff saved to https://phabricator.wikimedia.org/P24051 and previous config saved to /var/cache/conftool/dbconfig/20220404-153846-ladsgroup.json
- 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24050 and previous config saved to /var/cache/conftool/dbconfig/20220404-153839-ladsgroup.json
- 15:28 moritzm: remove stray debmonitor-server/cumin installs (cleanup of 548425b)
- 15:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
- 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases1002.eqiad.wmnet
- 15:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
- 15:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
- 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24049 and previous config saved to /var/cache/conftool/dbconfig/20220404-152333-ladsgroup.json
- 15:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host releases1002.eqiad.wmnet
- 15:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
- 15:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Use "unexpectedUnconnectedPage" page prop on Beta (production no-op) (duration: 00m 50s)
- 15:17 mmandere: pool cp6015 with HAProxy as TLS termination layer - T290005
- 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P24048 and previous config saved to /var/cache/conftool/dbconfig/20220404-150828-ladsgroup.json
- 15:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
- 15:05 mmandere: pool cp5008 with HAProxy as TLS termination layer - T290005
- 15:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5008.eqsin.wmnet with OS buster
- 14:55 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host alert1001.wikimedia.org
- 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24047 and previous config saved to /var/cache/conftool/dbconfig/20220404-145323-ladsgroup.json
- 14:44 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
- 14:44 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org
- 14:42 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
- 14:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
- 14:37 herron: rebooting alert2001
- 14:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
- 14:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5008.eqsin.wmnet with reason: host reimage
- 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases2002.codfw.wmnet
- 14:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host releases2002.codfw.wmnet
- 14:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
- 14:16 mmandere: depool cp6015 for reimage - T290005
- 14:08 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5008.eqsin.wmnet with OS buster
- 14:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
- 13:58 mmandere: depool cp5008 for reimage - T290005
- 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298565)', diff saved to https://phabricator.wikimedia.org/P24045 and previous config saved to /var/cache/conftool/dbconfig/20220404-135314-ladsgroup.json
- 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
- 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
- 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24044 and previous config saved to /var/cache/conftool/dbconfig/20220404-135307-ladsgroup.json
- 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
- 13:44 mmandere: pool cp3055 with HAProxy as TLS termination layer - T290005
- 13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
- 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24043 and previous config saved to /var/cache/conftool/dbconfig/20220404-133801-ladsgroup.json
- 13:35 mmandere: pool cp4022 with HAProxy as TLS termination layer - T290005
- 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5001.wikimedia.org
- 13:34 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3055.esams.wmnet with OS buster
- 13:31 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4022.ulsfo.wmnet with OS buster
- 13:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5001.wikimedia.org
- 13:26 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
- 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P24042 and previous config saved to /var/cache/conftool/dbconfig/20220404-132256-ladsgroup.json
- 13:20 urbanecm: UTC afternoon B&C window done
- 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
- 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
- 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
- 13:18 daniel@deploy1002: Synchronized multiversion/defines.php: Config: Always set MW_USE_CONFIG_SCHEMA. (T305176) (duration: 00m 50s)
- 13:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
- 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
- 13:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
- 13:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
- 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
- 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
- 13:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
- 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24041 and previous config saved to /var/cache/conftool/dbconfig/20220404-130751-ladsgroup.json
- 13:07 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
- 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
- 13:05 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
- 13:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7ebad8f: Add logo variants for zhwiki (T273578) (duration: 00m 51s)
- 13:04 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4022.ulsfo.wmnet with reason: host reimage
- 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
- 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
- 13:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
- 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
- 12:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bullseye
- 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
- 12:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
- 12:48 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4022.ulsfo.wmnet with OS buster
- 12:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bullseye
- 12:43 moritzm: installing gmp security updates
- 12:42 mmandere: depool cp4022 for reimage - T290005
- 12:38 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS buster
- 12:35 ottomata: removing retention.ms override from eventstreams publicly exposed topics in kafka main-eqiad and main-codfw - T241178
- 12:31 mmandere: depool cp3055 for reimage - T290005
- 12:31 ottomata: deleting empty typo topics from kafka main-eqiad: eqiad.mediawiki.page-edit (found while working on T241178)
- 12:26 ottomata: deleting empty typo topics from kafka main-codfw: codfw.mediawiki.page_delete, codfw.mediawiki.page_move, codfw.mediawiki.page_restore, codfw.mediawiki.revision_create, codfw.mediawiki.revision_visibility_set, codfw.mediawiki.user_block (found while working on T241178)
- 12:18 moritzm: installing expat updates (followups to earlier security fixes, no security impact by itself)
- 12:11 mmandere: pool cp4028 with HAProxy as TLS termination layer - T290005
- 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298565)', diff saved to https://phabricator.wikimedia.org/P24040 and previous config saved to /var/cache/conftool/dbconfig/20220404-121030-ladsgroup.json
- 12:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
- 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
- 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24039 and previous config saved to /var/cache/conftool/dbconfig/20220404-121022-ladsgroup.json
- 12:05 mmandere: pool cp3054 with HAProxy as TLS termination layer - T290005
- 12:04 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4028.ulsfo.wmnet with OS buster
- 12:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 12:01 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3054.esams.wmnet with OS buster
- 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24038 and previous config saved to /var/cache/conftool/dbconfig/20220404-115516-ladsgroup.json
- 11:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 11:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 11:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
- 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P24037 and previous config saved to /var/cache/conftool/dbconfig/20220404-114011-ladsgroup.json
- 11:39 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 11:37 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4028.ulsfo.wmnet with reason: host reimage
- 11:37 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
- 11:34 moritzm: installing zziplib security updates
- 11:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
- 11:27 moritzm: installing jbig2dec security updates
- 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24036 and previous config saved to /var/cache/conftool/dbconfig/20220404-112506-ladsgroup.json
- 11:20 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4028.ulsfo.wmnet with OS buster
- 11:18 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 11:12 mmandere: depool cp4028 for reimage - T290005
- 11:11 volans: deploying python3-wmflib 1.2.0 fleet-wide
- 11:09 jforrester@deploy1002: Finished deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block (duration: 00m 08s)
- 11:09 jforrester@deploy1002: Started deploy [integration/docroot@63b762d]: Id56cd5bf64ed Adding WikiLambda doc block
- 11:07 moritzm: installing cups security updates on buster (client side tools/libs)
- 11:04 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS buster
- 10:53 mmandere: depool cp3054 for reimage - T290005
- 10:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1003.eqiad.wmnet
- 10:38 volans: uploaded python3-wmflib_1.2.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
- 10:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1003.eqiad.wmnet
- 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298565)', diff saved to https://phabricator.wikimedia.org/P24035 and previous config saved to /var/cache/conftool/dbconfig/20220404-102616-ladsgroup.json
- 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
- 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
- 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24034 and previous config saved to /var/cache/conftool/dbconfig/20220404-102609-ladsgroup.json
- 10:26 moritzm: installing libxml2 security updates
- 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1004.eqiad.wmnet
- 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24033 and previous config saved to /var/cache/conftool/dbconfig/20220404-101104-ladsgroup.json
- 10:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1004.eqiad.wmnet
- 10:08 moritzm: installing icu bugfix updates from buster 10.12 point release
- 09:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-druid1005.eqiad.wmnet
- 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P24032 and previous config saved to /var/cache/conftool/dbconfig/20220404-095558-ladsgroup.json
- 09:55 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
- 09:54 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 09:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-druid1005.eqiad.wmnet
- 09:51 mmandere: pool cp6008 with HAProxy as TLS termination layer - T290005
- 09:48 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
- 09:47 moritzm: installing zlib security updates
- 09:44 mmandere: pool cp5003 with HAProxy as TLS termination layer - T290005
- 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24031 and previous config saved to /var/cache/conftool/dbconfig/20220404-094053-ladsgroup.json
- 09:31 moritzm: rolling restart of FPM/Apache on mw canaries to pick up updated zlib/glibc/openssl/libxml
- 09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
- 09:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
- 09:26 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS buster
- 09:26 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
- 09:25 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5003.eqsin.wmnet with OS buster
- 09:16 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
- 09:12 moritzm: installing openssl updates from Buster 10.12 point release
- 09:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
- 08:59 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
- 08:59 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
- 08:56 moritzm: installing glibc updates from buster 10.12 point release
- 08:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5003.eqsin.wmnet with reason: host reimage
- 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P24030 and previous config saved to /var/cache/conftool/dbconfig/20220404-084523-root.json
- 08:43 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 08:42 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
- 08:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 08:37 moritzm: installing flac security updates
- 08:37 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 08:37 mmandere: depool cp6008 for reimage - T290005
- 08:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 08:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
- 08:31 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 08:31 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
- 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
- 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298565)', diff saved to https://phabricator.wikimedia.org/P24029 and previous config saved to /var/cache/conftool/dbconfig/20220404-083031-ladsgroup.json
- 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
- 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
- 08:28 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5003.eqsin.wmnet with OS buster
- 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
- 08:25 urbanecm@deploy1002: Synchronized logos/config.yaml: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (3/3) (duration: 00m 50s)
- 08:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (2/3) (duration: 00m 50s)
- 08:23 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 158e0ce: Revert "cswiki: Add celebration logo for 500k" (1/3) (duration: 00m 51s)
- 08:19 mmandere: depool cp5003 for reimage - T290005
- 08:02 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 14s)
- 08:01 jayme@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
- 07:54 jayme: imported scap 4.6.0 to stretch-/buster-/bullseye-wikimedia - T305250
- 07:44 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 07:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 07:43 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 07:43 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 07:43 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 07:42 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
- 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
- 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
- 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
- 07:39 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 07:39 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 07:39 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 07:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 07:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
- 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
- 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
- 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
- 07:23 taavi: UTC morning deployments done
- 07:21 taavi@deploy1002: Synchronized wmf-config/throttle.php: Config: throttle: removed expired rule (T304836) (duration: 00m 49s)
- 07:19 taavi@deploy1002: Synchronized static/images/mobile/copyright/: Config: Revert "fawiki: Set celebration logo for new vector" (T304314) (duration: 00m 49s)
- 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
- 07:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "fawiki: Set celebration logo for new vector" (T304314) (duration: 00m 50s)
- 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
- 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
- 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
- 07:15 taavi@deploy1002: Synchronized static/images/project-logos: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 50s)
- 07:14 taavi@deploy1002: Synchronized logos/config.yaml: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 50s)
- 07:13 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: Revert "fawiki: Set new year celebration" (T304314) (duration: 00m 51s)
- 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
- 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
- 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
- 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
- 07:08 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Content and Section Translation for Persian Wikipedia (T296475) (duration: 00m 51s)
- 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 06:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 06:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 06:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24027 and previous config saved to /var/cache/conftool/dbconfig/20220404-060542-ladsgroup.json
- 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24026 and previous config saved to /var/cache/conftool/dbconfig/20220404-055037-ladsgroup.json
- 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1130.eqiad.wmnet with OS bullseye
- 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P24025 and previous config saved to /var/cache/conftool/dbconfig/20220404-053531-ladsgroup.json
- 05:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
- 05:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: host reimage
- 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24024 and previous config saved to /var/cache/conftool/dbconfig/20220404-052026-ladsgroup.json
- 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1130.eqiad.wmnet with OS bullseye
- 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298565)', diff saved to https://phabricator.wikimedia.org/P24023 and previous config saved to /var/cache/conftool/dbconfig/20220404-041545-ladsgroup.json
- 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
- 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
- 03:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
- 03:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
- 02:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
- 02:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
- 01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
- 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
2022-04-02
- 11:26 akosiaris: disable zotero paging until T291707 is resolved.
- 11:11 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
- 11:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
2022-04-01
- 23:25 mutante: DNS - new project language 'kcg'. 'Tyap is a regionally important dialect cluster of Plateau languages in Nigeria's Middle Belt, named after its prestige dialect. It is also known by its Hausa exonym as Katab or Kataf.' T305279
- 23:08 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
- 23:08 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
- 22:04 bblack: esams re-pooled - T304089
- 20:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:19 volans@cumin1001: START - Cookbook sre.dns.netbox
- 19:48 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
- 19:47 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
- 19:44 mutante: rebooting parsoid canary appservers - wtp1025, wtp1026, parse2001, parse2002
- 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
- 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].eqiad.wmnet
- 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=parse200[1-2].eqiad.wmnet
- 19:37 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
- 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
- 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1450.eqiad.wmnet
- 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=varnish-fe
- 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-tls
- 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-be
- 19:16 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
- 19:16 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
- 19:01 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
- 19:00 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp2036.codfw.wmnet
- 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1414.wmnet
- 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].wmnet
- 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw1414.wmnet
- 18:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw141[4-8].wmnet
- 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2036.codfw.wmnet
- 13:05 dcausse: reseting jvmquake flag on all wdqs hosts
- 12:52 dcausse: restarting blazegraph on wdqs1006 and resetting jvmquake warning flag
- 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
- 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
- 11:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
- 10:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
- 10:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
- 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
- 10:47 vgutierrez: reboot acme-chief instances to catch up on kernel upgrades
- 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6002.drmrs.wmnet
- 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6002.drmrs.wmnet
- 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6001.drmrs.wmnet
- 10:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6001.drmrs.wmnet
- 10:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5002.eqsin.wmnet
- 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5002.eqsin.wmnet
- 10:06 vgutierrez: vgutierrez@puppetmaster2001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
- 10:04 vgutierrez: vgutierrez@puppetmaster1001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
- 10:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5001.eqsin.wmnet
- 09:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5001.eqsin.wmnet
- 09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4002.ulsfo.wmnet
- 09:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4002.ulsfo.wmnet
- 09:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4001.ulsfo.wmnet
- 09:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4001.ulsfo.wmnet
- 09:35 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir3002.esams.wmnet
- 09:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3002.esams.wmnet
- 09:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3001.esams.wmnet
- 09:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3001.esams.wmnet
- 09:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet
- 09:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet
- 09:10 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir2001.codfw.wmnet
- 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2001.codfw.wmnet
- 08:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1002.eqiad.wmnet
- 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1002.eqiad.wmnet
- 08:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1001.eqiad.wmnet
- 08:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
- 08:48 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ncredir1001.eqiad.wmnet
- 08:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
- 08:44 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
- 08:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 08:42 vgutierrez: rolling restart of ncredir instances to catch up on kernel upgrades
- 06:54 XioNoX: traffic engineering in drmrs to prevent link saturation