You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(mutante: gitlab2001 - fdisk /dev/vdb (g, w) (create partition table), (n, w) (create partition) ; mkfs.ext4 /dev/vdb1 (create filesystem); systemctl reset-failed (fix Icinga alert); mkdir /mnt/gitlab-backup; mount /dev/vdb1 /mnt/gitlab-backup ; blkid (get UUID); edit /etc/fstab and insert "UUID=c5235682-ac21-46a9-85ee-9603f694a6a4 /mnt/gitlab-backup ext4 errors=remount-ro 0 2" T274463)
imported>Stashbot
(brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye)
 
(284 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2022-03-31 ==
== 2023-02-03 ==
* 23:45 mutante: gitlab2001 - fdisk /dev/vdb (g, w) (create partition table), (n, w) (create partition) ; mkfs.ext4 /dev/vdb1 (create filesystem); systemctl reset-failed (fix Icinga alert); mkdir /mnt/gitlab-backup; mount /dev/vdb1 /mnt/gitlab-backup ; blkid (get UUID);  edit /etc/fstab and insert "UUID=c5235682-ac21-46a9-85ee-{{Gerrit|9603f694a6a4}} /mnt/gitlab-backup ext4 errors=remount-ro 0 2" [[phab:T274463|T274463]]
* 00:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye
* 23:27 mutante: gitlab2001 - rebooted on ganeti level (needed when adding new virtual hardware), then ran into the usual bug [[phab:T272555|T272555]] where you have to manually fix the interface in /etc/network/interfaces  [[phab:T274463|T274463]]
* 23:21 mutante: gitlab2001 (gitlab-replica.wikimedia.org) - rebooting to add new virtual disk [[phab:T274463|T274463]]
* 23:11 ejegg: updated payments-wiki from {{Gerrit|47d9bd27}} to {{Gerrit|6f888c28}}
* 23:01 bblack: esams->drmrs failover test begins - [[phab:T304089|T304089]]
* 22:34 moritzm: updated CAS to 6.4.6.2
* 22:28 mutante: ganeti - creating new 100G virtual disk on gitlab1001 [[phab:T274463|T274463]]
* 22:24 mutante: ganeti - creating new 100G virtual disk on gitlab2001 [[phab:T274463|T274463]]
* 22:16 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 22:03 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 22:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 21:51 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 21:48 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 21:40 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:19 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^(cp1075{{!}}cp1079{{!}}cp2035{{!}}cp3050{{!}}cp3051{{!}}cp3052{{!}}cp3054{{!}}cp4022{{!}}cp5013{{!}}cp5014{{!}}cp5015).*
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:17 bblack@cumin1001: conftool action : select; selector: name="^(cp1075{{!}}cp1079{{!}}cp2035{{!}}cp3050{{!}}cp3051{{!}}cp3052{{!}}cp3054{{!}}cp4022{{!}}cp5013{{!}}cp5014{{!}}cp5015).*"
* 21:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: [[gerrit:775876{{!}}Remove unused Flow config]] (duration: 00m 49s)
* 21:07 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp5012.eqsin.wmnet
* 21:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 21:06 thcipriani: utc late backport complete
* 21:03 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 20:59 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 20:56 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/GrowthExperiments/modules/ext.growthExperiments.Homepage.SuggestedEdits/MatchModeSelectWidget.less: Backport: [[gerrit:775371{{!}}Newcomer tasks: always align button and text to the right (T301825)]] (duration: 00m 50s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:49 thcipriani@deploy1002: Synchronized tests: Config (noop -- tests) (duration: 00m 50s)
* 20:47 thcipriani@deploy1002: Synchronized src/StaticSiteConfiguration.php: Config (noop -- comment change): [[gerrit:775427{{!}}phpcs: enable and fix PropertyDocumentation.MissingVar (T171115)]] (duration: 00m 50s)
* 20:46 thcipriani@deploy1002: Synchronized phpcs.xml: Config (noop): [[gerrit:775427{{!}}phpcs: enable and fix PropertyDocumentation.MissingVar (T171115)]] [[gerrit:775426{{!}}phpcs: rename test files to match class names (T171115)]] [[gerrit:775005{{!}}phpcs: enable rules that are already passing (T171115)]] (duration: 00m 49s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:40 mutante: reserving port 4017 for new k8s service request 'image-suggestions' [[phab:T304891|T304891]]
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:774500{{!}}Stop writing to $wmfLocalServices (T45956)]] (duration: 00m 50s)
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:29 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:774499{{!}}Migrate $wmfLocalServices to $wmgLocalServices (T45956)]] (duration: 00m 51s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2007.codfw.wmnet
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:22 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
* 20:22 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:774497{{!}}Start writing to $wmgLocalServices the same value as to $wmfLocalServices (T45956)]] (duration: 00m 50s)
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:21 mutante: contint2002 - reboot (insetup host)
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:18 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
* 20:17 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2007.codfw.wmnet
* 20:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
* 20:16 thcipriani@deploy1002: Synchronized wmf-config/PhpAutoPrepend.php: Config: [[gerrit:774019{{!}}Migrate $wmfServiceConfig to $wmgServiceConfig (T45956)]] (duration: 00m 50s)
* 20:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
* 20:12 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5001.eqsin.wmnet
* 20:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1075.eqiad.wmnet
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2376.codfw.wmnet
* 20:10 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2374.codfw.wmnet
* 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:09 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2272.codfw.wmnet
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:09 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2252.codfw.wmnet
* 20:08 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2271.codfw.wmnet
* 20:08 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2251.codfw.wmnet
* 20:07 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
* 20:07 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5001.eqsin.wmnet
* 20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5014.eqsin.wmnet
* 20:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2376.codfw.wmnet
* 20:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2374.codfw.wmnet
* 20:04 mutante: mw2271,mw2222 - canary appserver, rebooting
* 20:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2035.codfw.wmnet
* 20:04 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4005.ulsfo.wmnet
* 20:01 mutante: mw2251,mw2252 - canary appserver, rebooting
* 20:00 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4005.ulsfo.wmnet
* 19:59 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2272.codfw.wmnet
* 19:59 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
* 19:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2252.codfw.wmnet
* 19:57 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2251.codfw.wmnet
* 19:55 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
* 19:46 mutante: phab2001 - systemctl restart ssh-phab
* 19:45 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
* 19:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
* 19:43 rzl: Rolling-restarted zotero to un-wedge wedged pods with offscale high CPU
* 19:42 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: sync
* 19:42 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: sync
* 19:38 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2008.codfw.wmnet
* 19:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5014.eqsin.wmnet
* 19:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
* 19:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
* 19:28 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
* 19:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5015.eqsin.wmnet
* 19:26 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2008.codfw.wmnet
* 19:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
* 19:24 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
* 19:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
* 19:23 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
* 19:21 cwhite: remove openjdk-8-jre from eqiad logstash nodes [[phab:T301770|T301770]]
* 19:21 mutante: phab2001 - powercycling via mgmt
* 19:20 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
* 19:20 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
* 19:19 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
* 19:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
* 19:15 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
* 19:15 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
* 19:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
* 19:14 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
* 19:14 mutante: phab2001 - git-ssh.codfw - rebooting - might cause pybal alert
* 19:13 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5015.eqsin.wmnet
* 19:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4022.ulsfo.wmnet
* 19:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
* 19:09 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
* 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2035.codfw.wmnet
* 19:07 bblack@cumin1001: conftool action : set/pooled=yes; selector: cluster=ml_staging
* 19:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1075.eqiad.wmnet
* 19:07 bblack@cumin1001: conftool action : set/weight=1; selector: cluster=ml_staging
* 19:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5013.eqsin.wmnet
* 19:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
* 19:06 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5002.eqsin.wmnet
* 19:05 mutante: doc.wikimedia.org - short downtime due to maintenance, rebooting doc1001
* 19:02 mutante: testreduce1001 - needed manual nginx restart after reboot to make https://parsoid-rt-tests.wikimedia.org/ work again
* 19:01 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5002.eqsin.wmnet
* 19:00 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.1-1+deb11u1_source.changes
* 19:00 mutante: testreduce1001 - rebooting
* 18:59 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4006.ulsfo.wmnet
* 18:59 mutante: https://parsoid-rt-tests.wikimedia.org/ - short downtime due to maintenance
* 18:59 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4022.ulsfo.wmnet
* 18:56 mutante: scandium - rebooting
* 18:54 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4006.ulsfo.wmnet
* 18:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
* 18:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5013.eqsin.wmnet
* 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3054.esams.wmnet
* 18:50 mutante: mwdebug1001 - rebooting
* 18:49 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
* 18:43 duesen: removing /var/run/php/use-config-schema  from canaries mw1415, mw1438, and mw1448 to disable config schema loading ([[phab:T304460|T304460]])
* 18:41 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
* 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3054.esams.wmnet
* 18:36 mutante: gerrit-replica.wikimedia.org short downtime, rebooting gerrit2001
* 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:23 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/TimedMediaHandler/resources/ext.tmh.player.styles.less: Backport: [[gerrit:775443{{!}}Set noflip for css rule that needs it (T305156)]] (duration: 00m 51s)
* 18:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:20 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2009.codfw.wmnet
* 18:19 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@ba88f51]: 0.3.109 (duration: 07m 24s)
* 18:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns2001.wikimedia.org
* 18:13 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.109` on canary `wdqs1003`; proceeding to rest of fleet
* 18:11 ryankemper@deploy1002: Started deploy [wdqs/wdqs@ba88f51]: 0.3.109
* 18:11 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.109`. Pre-deploy tests passing on canary `wdqs1003`
* 18:08 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2009.codfw.wmnet
* 18:03 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
* 17:57 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
* 17:52 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host authdns2001.wikimedia.org
* 17:47 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns1001.wikimedia.org
* 17:41 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host authdns1001.wikimedia.org
* 17:37 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
* 17:31 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1001.wikimedia.org
* 17:30 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
* 17:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5003.eqsin.wmnet
* 17:25 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns1001.wikimedia.org
* 17:25 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2001.wikimedia.org
* 17:24 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5003.eqsin.wmnet
* 17:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet
* 17:17 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet
* 17:17 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
* 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Maint', diff saved to https://phabricator.wikimedia.org/P24019 and previous config saved to /var/cache/conftool/dbconfig/20220331-171724-ladsgroup.json
* 17:10 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
* 17:10 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2010.codfw.wmnet
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Maint', diff saved to https://phabricator.wikimedia.org/P24018 and previous config saved to /var/cache/conftool/dbconfig/20220331-170221-ladsgroup.json
* 16:58 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2010.codfw.wmnet
* 16:58 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
* 16:57 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6002.wikimedia.org
* 16:55 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2001.wikimedia.org
* 16:54 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
* 16:51 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
* 16:51 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns6002.wikimedia.org
* 16:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5002.wikimedia.org
* 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Maint', diff saved to https://phabricator.wikimedia.org/P24017 and previous config saved to /var/cache/conftool/dbconfig/20220331-164717-ladsgroup.json
* 16:47 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
* 16:47 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
* 16:42 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns5002.wikimedia.org
* 16:42 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4002.wikimedia.org
* 16:37 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
* 16:37 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5001.wikimedia.org
* 16:33 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns4002.wikimedia.org
* 16:33 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3002.wikimedia.org
* 16:33 duesen: creating /var/run/php/use-config-schema  on canaries mw1415, mw1438, and mw1448 to enable config schema loading ([[phab:T304460|T304460]])
* 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Maint', diff saved to https://phabricator.wikimedia.org/P24016 and previous config saved to /var/cache/conftool/dbconfig/20220331-163213-ladsgroup.json
* 16:28 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns5001.wikimedia.org
* 16:28 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6001.wikimedia.org
* 16:25 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3002.wikimedia.org
* 16:25 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1002.wikimedia.org
* 16:20 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns6001.wikimedia.org
* 16:19 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns1002.wikimedia.org
* 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: Maint', diff saved to https://phabricator.wikimedia.org/P24015 and previous config saved to /var/cache/conftool/dbconfig/20220331-161709-ladsgroup.json
* 16:17 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2002.wikimedia.org
* 16:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
* 16:11 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dns2002.wikimedia.org
* 16:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
* 15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:44 mmandere: pool cp6016 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
* 15:40 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:18 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 15:15 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 15:13 mmandere: pool cp5009 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:11 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:10 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 12 hosts with reason: reboot for update [[phab:T304938|T304938]]
* 15:10 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5009.eqsin.wmnet with OS buster
* 15:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on 12 hosts with reason: reboot for update [[phab:T304938|T304938]]
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on durum[1001-1002].eqiad.wmnet with reason: reboot for update [[phab:T304938|T304938]]
* 15:05 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on durum[1001-1002].eqiad.wmnet with reason: reboot for update [[phab:T304938|T304938]]
* 15:05 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:57 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
* 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh6002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh6002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh6001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh6001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh5002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh5002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh5001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh5001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:50 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:47 mmandere: depool cp6016 for reimage - [[phab:T290005|T290005]]
* 14:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh4002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh4002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on doh4001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on doh4001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:39 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5009.eqsin.wmnet with reason: host reimage
* 14:36 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5009.eqsin.wmnet with reason: host reimage
* 14:22 duesen: (late) about 5 hours ago, I removed /var/run/php/use-config-schema  from mw1415 to disable config schema loading ([[phab:T304460|T304460]])
* 14:09 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5009.eqsin.wmnet with OS buster
* 14:05 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5009.eqsin.wmnet with OS buster
* 14:03 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5009.eqsin.wmnet with OS buster
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:02 moritzm: installing vim security updates on buster
* 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1002.wikimedia.org
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:56 Lucas_WMDE: UTC afternoon backport+config window done
* 13:55 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/includes/changetags/ChangeTags.php: Backport: [[gerrit:775437{{!}}ChangeTags: Use localizer with correct page title to parse messages (T302754)]] (duration: 00m 51s)
* 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:53 mmandere: depool cp5009 for reimage - [[phab:T290005|T290005]]
* 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1002.wikimedia.org
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2001.wikimedia.org
* 13:51 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/resources/src/mediawiki.special.createaccount/HtmlformChecker.js: Backport: [[gerrit:775432{{!}}Fix error/warning boxes on signup form (T305098)]] (duration: 00m 50s)
* 13:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2001.wikimedia.org
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/CentralAuth/includes/Special/GlobalUsersPager.php: Backport: [[gerrit:775436{{!}}Revert "GlobalUsersPager: add gu_id to GROUP BY"]] (duration: 00m 50s)
* 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:20 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/tests/phpunit/structure/SpecialPageFatalTest.php: Backport: [[gerrit:775435{{!}}Revert "Add SpecialPageFatalTest to @group Database"]] (no-op) (duration: 00m 50s)
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:755453{{!}}Configure `mul` language code on Test Wikidata and its clients (T297393)]] (2/2) (duration: 00m 50s)
* 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:755453{{!}}Configure `mul` language code on Test Wikidata and its clients (T297393)]] (1/2) (duration: 00m 51s)
* 13:03 mmandere: pool cp4023 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 12:53 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4023.ulsfo.wmnet with OS buster
* 12:53 mmandere: pool cp3057 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 12:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3057.esams.wmnet with OS buster
* 12:48 XioNoX: analytics1-b/c/d-eqiad: replace firewall filter with strict uRPF - [[phab:T298087|T298087]]
* 12:31 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4023.ulsfo.wmnet with reason: host reimage
* 12:28 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4023.ulsfo.wmnet with reason: host reimage
* 12:25 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P24013 and previous config saved to /var/cache/conftool/dbconfig/20220331-122247-marostegui.json
* 12:22 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
* 12:12 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4023.ulsfo.wmnet with OS buster
* 12:07 mmandere: depool cp4023 for reimage - [[phab:T290005|T290005]]
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P24012 and previous config saved to /var/cache/conftool/dbconfig/20220331-120742-marostegui.json
* 12:04 moritzm: installing wireshark security updates
* 11:54 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3057.esams.wmnet with OS buster
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P24011 and previous config saved to /var/cache/conftool/dbconfig/20220331-115235-marostegui.json
* 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
* 11:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
* 11:39 mmandere: depool cp3057 for reimage - [[phab:T290005|T290005]]
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P24010 and previous config saved to /var/cache/conftool/dbconfig/20220331-113730-marostegui.json
* 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2002.codfw.wmnet
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2002.codfw.wmnet
* 11:19 moritzm: installing libpcap security updates
* 11:16 mmandere: pool cp3056 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 11:08 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS buster
* 10:55 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:53 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:53 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:44 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
* 10:41 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
* 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P24009 and previous config saved to /var/cache/conftool/dbconfig/20220331-102819-marostegui.json
* 10:26 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
* 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
* 10:14 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS buster
* 10:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P24007 and previous config saved to /var/cache/conftool/dbconfig/20220331-101314-marostegui.json
* 10:12 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb1002.eqiad.wmnet
* 10:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host miscweb1002.eqiad.wmnet
* 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2002.codfw.wmnet
* 10:00 mmandere: pool cp4029 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 10:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P24006 and previous config saved to /var/cache/conftool/dbconfig/20220331-095809-marostegui.json
* 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host miscweb2002.codfw.wmnet
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P24005 and previous config saved to /var/cache/conftool/dbconfig/20220331-095319-marostegui.json
* 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24004 and previous config saved to /var/cache/conftool/dbconfig/20220331-095228-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P24003 and previous config saved to /var/cache/conftool/dbconfig/20220331-094304-marostegui.json
* 09:43 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4029.ulsfo.wmnet with OS buster
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24002 and previous config saved to /var/cache/conftool/dbconfig/20220331-093725-root.json
* 09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
* 09:26 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS buster
* 09:25 duesen: removed /var/run/php/use-config-schema  from mwdebug1002 to disable config schema loading ([[phab:T304460|T304460]])
* 09:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
* 09:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1002.eqiad.wmnet
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24001 and previous config saved to /var/cache/conftool/dbconfig/20220331-092221-root.json
* 09:21 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4029.ulsfo.wmnet with reason: host reimage
* 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
* 09:18 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4029.ulsfo.wmnet with reason: host reimage
* 09:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1002.eqiad.wmnet
* 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
* 09:16 duesen: created /var/run/php/use-config-schema  on canary mw1415 to enable config schema loading ([[phab:T304460|T304460]])
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T297189|T297189]])', diff saved to https://phabricator.wikimedia.org/P24000 and previous config saved to /var/cache/conftool/dbconfig/20220331-091626-marostegui.json
* 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 09:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
* 09:09 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
* 09:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
* 09:08 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
* 09:08 duesen: created /var/run/php/use-config-schema  on mwdebug1002 to enable config schema loading ([[phab:T304460|T304460]])
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23999 and previous config saved to /var/cache/conftool/dbconfig/20220331-090717-root.json
* 09:02 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4029.ulsfo.wmnet with OS buster
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp1001.wikimedia.org
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
* 08:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS stretch
* 08:57 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
* 08:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
* 08:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp1001.wikimedia.org
* 08:54 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
* 08:53 mmandere: depool cp4029 for reimage - [[phab:T290005|T290005]]
* 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1001.eqiad.wmnet
* 08:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1001.eqiad.wmnet
* 08:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
* 08:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:40 XioNoX: analytics1-a-eqiad: replace firewall filter with strict uRPF - [[phab:T298087|T298087]]
* 08:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
* 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp2001.wikimedia.org
* 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:35 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.5  refs [[phab:T300204|T300204]]
* 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp2001.wikimedia.org
* 08:30 hashar@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/OATHAuth/src/OATHUserRepository.php: Backport: [[gerrit:774996{{!}}Revert "OATHUserRepository: Stop handling legacy single-key" (T305029)]] (duration: 00m 51s)
* 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23997 and previous config saved to /var/cache/conftool/dbconfig/20220331-082525-marostegui.json
* 08:25 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS buster
* 08:19 daniel@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/GrowthExperiments/modules/ext.growthExperiments.PostEdit/index.js: Backport: [[gerrit:775370{{!}}Post-edit dialog: check for presence of preferences.topicFilters (T305057)]] (duration: 00m 53s)
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23996 and previous config saved to /var/cache/conftool/dbconfig/20220331-081020-marostegui.json
* 08:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23995 and previous config saved to /var/cache/conftool/dbconfig/20220331-075515-marostegui.json
* 07:41 mmandere: depool cp3056 for reimage - [[phab:T290005|T290005]]
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23994 and previous config saved to /var/cache/conftool/dbconfig/20220331-074010-marostegui.json
* 07:30 daniel@deploy1002: Synchronized multiversion/defines.php: Config: [[gerrit:772937{{!}}Set MW_USE_CONFIG_SCHEMA constant if file exists. (T304460)]] (duration: 00m 52s)
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:18 moritzm: updating libapache2-mod-auth-cas on buster hosts
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
* 06:48 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
* 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23993 and previous config saved to /var/cache/conftool/dbconfig/20220331-063429-ladsgroup.json
* 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23992 and previous config saved to /var/cache/conftool/dbconfig/20220331-061923-ladsgroup.json
* 06:12 marostegui: dbmaint s5@eqiad [[phab:T300381|T300381]]
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 [[phab:T303798|T303798]]', diff saved to https://phabricator.wikimedia.org/P23991 and previous config saved to /var/cache/conftool/dbconfig/20220331-060820-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23990 and previous config saved to /var/cache/conftool/dbconfig/20220331-060517-marostegui.json
* 06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 06:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23989 and previous config saved to /var/cache/conftool/dbconfig/20220331-060509-marostegui.json
* 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23988 and previous config saved to /var/cache/conftool/dbconfig/20220331-060418-ladsgroup.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write [[phab:T303798|T303798]]', diff saved to https://phabricator.wikimedia.org/P23987 and previous config saved to /var/cache/conftool/dbconfig/20220331-060122-root.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - [[phab:T303798|T303798]]', diff saved to https://phabricator.wikimedia.org/P23986 and previous config saved to /var/cache/conftool/dbconfig/20220331-060042-root.json
* 06:00 marostegui: Starting s5 eqiad failover from db1130 to db1100 - [[phab:T303798|T303798]]
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23985 and previous config saved to /var/cache/conftool/dbconfig/20220331-055004-marostegui.json
* 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23984 and previous config saved to /var/cache/conftool/dbconfig/20220331-054913-ladsgroup.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23983 and previous config saved to /var/cache/conftool/dbconfig/20220331-053459-marostegui.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23981 and previous config saved to /var/cache/conftool/dbconfig/20220331-051954-marostegui.json
* 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23980 and previous config saved to /var/cache/conftool/dbconfig/20220331-044859-ladsgroup.json
* 04:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 04:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23979 and previous config saved to /var/cache/conftool/dbconfig/20220331-044851-ladsgroup.json
* 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 [[phab:T303798|T303798]]', diff saved to https://phabricator.wikimedia.org/P23978 and previous config saved to /var/cache/conftool/dbconfig/20220331-043906-marostegui.json
* 04:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 22 hosts with reason: Primary switchover s5 [[phab:T303798|T303798]]
* 04:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 22 hosts with reason: Primary switchover s5 [[phab:T303798|T303798]]
* 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23977 and previous config saved to /var/cache/conftool/dbconfig/20220331-043346-ladsgroup.json
* 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23976 and previous config saved to /var/cache/conftool/dbconfig/20220331-041841-ladsgroup.json
* 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23975 and previous config saved to /var/cache/conftool/dbconfig/20220331-040940-ladsgroup.json
* 04:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 04:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23974 and previous config saved to /var/cache/conftool/dbconfig/20220331-040916-ladsgroup.json
* 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23973 and previous config saved to /var/cache/conftool/dbconfig/20220331-040336-ladsgroup.json
* 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23972 and previous config saved to /var/cache/conftool/dbconfig/20220331-035411-ladsgroup.json
* 03:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23971 and previous config saved to /var/cache/conftool/dbconfig/20220331-034709-marostegui.json
* 03:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 03:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 03:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23970 and previous config saved to /var/cache/conftool/dbconfig/20220331-034701-marostegui.json
* 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23969 and previous config saved to /var/cache/conftool/dbconfig/20220331-033906-ladsgroup.json
* 03:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23968 and previous config saved to /var/cache/conftool/dbconfig/20220331-033156-marostegui.json
* 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23967 and previous config saved to /var/cache/conftool/dbconfig/20220331-032401-ladsgroup.json
* 03:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23966 and previous config saved to /var/cache/conftool/dbconfig/20220331-031651-marostegui.json
* 03:15 ejegg: civicrm revision changed from {{Gerrit|a6f49bb3}} to {{Gerrit|84c737b6}}
* 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23965 and previous config saved to /var/cache/conftool/dbconfig/20220331-030531-ladsgroup.json
* 03:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 03:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23964 and previous config saved to /var/cache/conftool/dbconfig/20220331-030523-ladsgroup.json
* 03:04 eileen: civicrm revision changed from {{Gerrit|a9c323af}} to {{Gerrit|a6f49bb3}}
* 03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23963 and previous config saved to /var/cache/conftool/dbconfig/20220331-030321-ladsgroup.json
* 03:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 03:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23962 and previous config saved to /var/cache/conftool/dbconfig/20220331-030313-ladsgroup.json
* 03:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23961 and previous config saved to /var/cache/conftool/dbconfig/20220331-030146-marostegui.json
* 02:50 catrope@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: [[gerrit:773966{{!}}Code style-only change to MWConfigCacheGenerator.php]] (duration: 00m 52s)
* 02:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23960 and previous config saved to /var/cache/conftool/dbconfig/20220331-025018-ladsgroup.json
* 02:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23959 and previous config saved to /var/cache/conftool/dbconfig/20220331-024808-ladsgroup.json
* 02:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23958 and previous config saved to /var/cache/conftool/dbconfig/20220331-023513-ladsgroup.json
* 02:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23957 and previous config saved to /var/cache/conftool/dbconfig/20220331-023303-ladsgroup.json
* 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23956 and previous config saved to /var/cache/conftool/dbconfig/20220331-022008-ladsgroup.json
* 02:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23955 and previous config saved to /var/cache/conftool/dbconfig/20220331-021758-ladsgroup.json
* 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23954 and previous config saved to /var/cache/conftool/dbconfig/20220331-021450-ladsgroup.json
* 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
* 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
* 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
* 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23953 and previous config saved to /var/cache/conftool/dbconfig/20220331-021413-ladsgroup.json
* 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23952 and previous config saved to /var/cache/conftool/dbconfig/20220331-020643-ladsgroup.json
* 02:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 02:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23951 and previous config saved to /var/cache/conftool/dbconfig/20220331-020635-ladsgroup.json
* 01:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23950 and previous config saved to /var/cache/conftool/dbconfig/20220331-015908-ladsgroup.json
* 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23949 and previous config saved to /var/cache/conftool/dbconfig/20220331-015130-ladsgroup.json
* 01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23948 and previous config saved to /var/cache/conftool/dbconfig/20220331-014403-ladsgroup.json
* 01:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P23947 and previous config saved to /var/cache/conftool/dbconfig/20220331-014140-marostegui.json
* 01:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 01:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 01:38 eileen: revision changed from {{Gerrit|4bb3ec09}} to {{Gerrit|a9c323af}}
* 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23946 and previous config saved to /var/cache/conftool/dbconfig/20220331-013625-ladsgroup.json
* 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23945 and previous config saved to /var/cache/conftool/dbconfig/20220331-012858-ladsgroup.json
* 01:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23944 and previous config saved to /var/cache/conftool/dbconfig/20220331-012734-marostegui.json
* 01:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 01:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 01:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23943 and previous config saved to /var/cache/conftool/dbconfig/20220331-012726-marostegui.json
* 01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23942 and previous config saved to /var/cache/conftool/dbconfig/20220331-012650-ladsgroup.json
* 01:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 01:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 01:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
* 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23941 and previous config saved to /var/cache/conftool/dbconfig/20220331-012637-ladsgroup.json
* 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23940 and previous config saved to /var/cache/conftool/dbconfig/20220331-012120-ladsgroup.json
* 01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23939 and previous config saved to /var/cache/conftool/dbconfig/20220331-011221-marostegui.json
* 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23938 and previous config saved to /var/cache/conftool/dbconfig/20220331-011132-ladsgroup.json
* 00:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23937 and previous config saved to /var/cache/conftool/dbconfig/20220331-005716-marostegui.json
* 00:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23936 and previous config saved to /var/cache/conftool/dbconfig/20220331-005627-ladsgroup.json
* 00:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23935 and previous config saved to /var/cache/conftool/dbconfig/20220331-004211-marostegui.json
* 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23934 and previous config saved to /var/cache/conftool/dbconfig/20220331-004122-ladsgroup.json
* 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23933 and previous config saved to /var/cache/conftool/dbconfig/20220331-003914-ladsgroup.json
* 00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 00:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
* 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23932 and previous config saved to /var/cache/conftool/dbconfig/20220331-003906-ladsgroup.json
* 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23931 and previous config saved to /var/cache/conftool/dbconfig/20220331-003834-ladsgroup.json
* 00:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 00:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23930 and previous config saved to /var/cache/conftool/dbconfig/20220331-003826-ladsgroup.json
* 00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23929 and previous config saved to /var/cache/conftool/dbconfig/20220331-002401-ladsgroup.json
* 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23928 and previous config saved to /var/cache/conftool/dbconfig/20220331-002321-ladsgroup.json
* 00:17 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.1-1_source.changes  # [[phab:T299705|T299705]]
* 00:13 eileen: revision changed from {{Gerrit|951ffb1d}} to {{Gerrit|4bb3ec09}}
* 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23927 and previous config saved to /var/cache/conftool/dbconfig/20220331-000856-ladsgroup.json
* 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23926 and previous config saved to /var/cache/conftool/dbconfig/20220331-000816-ladsgroup.json


== 2022-03-30 ==
== 2023-02-02 ==
* 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23925 and previous config saved to /var/cache/conftool/dbconfig/20220330-235351-ladsgroup.json
* 22:58 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1080.eqiad.wmnet with OS bullseye
* 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23924 and previous config saved to /var/cache/conftool/dbconfig/20220330-235311-ladsgroup.json
* 22:15 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1079.eqiad.wmnet
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23923 and previous config saved to /var/cache/conftool/dbconfig/20220330-235143-ladsgroup.json
* 22:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS bullseye
* 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye
* 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 22:00 brett@cumin2002: conftool action : set/pooled
* 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23922 and previous config saved to /var/cache/conftool/dbconfig/20220330-235131-ladsgroup.json
* 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23921 and previous config saved to /var/cache/conftool/dbconfig/20220330-233625-ladsgroup.json
* 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23920 and previous config saved to /var/cache/conftool/dbconfig/20220330-232120-ladsgroup.json
* 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23919 and previous config saved to /var/cache/conftool/dbconfig/20220330-230914-ladsgroup.json
* 23:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 23:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23918 and previous config saved to /var/cache/conftool/dbconfig/20220330-230905-ladsgroup.json
* 23:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23917 and previous config saved to /var/cache/conftool/dbconfig/20220330-230803-marostegui.json
* 23:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 23:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 23:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23916 and previous config saved to /var/cache/conftool/dbconfig/20220330-230755-marostegui.json
* 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23915 and previous config saved to /var/cache/conftool/dbconfig/20220330-230615-ladsgroup.json
* 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23914 and previous config saved to /var/cache/conftool/dbconfig/20220330-230408-ladsgroup.json
* 23:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 23:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23913 and previous config saved to /var/cache/conftool/dbconfig/20220330-230336-ladsgroup.json
* 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23912 and previous config saved to /var/cache/conftool/dbconfig/20220330-225401-ladsgroup.json
* 22:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23911 and previous config saved to /var/cache/conftool/dbconfig/20220330-225250-marostegui.json
* 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23910 and previous config saved to /var/cache/conftool/dbconfig/20220330-224831-ladsgroup.json
* 22:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23909 and previous config saved to /var/cache/conftool/dbconfig/20220330-223856-ladsgroup.json
* 22:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23908 and previous config saved to /var/cache/conftool/dbconfig/20220330-223745-marostegui.json
* 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23907 and previous config saved to /var/cache/conftool/dbconfig/20220330-223325-ladsgroup.json
* 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23906 and previous config saved to /var/cache/conftool/dbconfig/20220330-222351-ladsgroup.json
* 22:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23905 and previous config saved to /var/cache/conftool/dbconfig/20220330-222240-marostegui.json
* 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23904 and previous config saved to /var/cache/conftool/dbconfig/20220330-221820-ladsgroup.json
* 22:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
* 21:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
* 21:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 21:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
* 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23903 and previous config saved to /var/cache/conftool/dbconfig/20220330-211806-ladsgroup.json
* 21:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 21:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
* 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23902 and previous config saved to /var/cache/conftool/dbconfig/20220330-211758-ladsgroup.json
* 21:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 21:03 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
* 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23900 and previous config saved to /var/cache/conftool/dbconfig/20220330-210253-ladsgroup.json
* 20:56 ejegg: updated fundraising python tools from {{Gerrit|8f5119f6}} to {{Gerrit|af97fc4a}}
* 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23899 and previous config saved to /var/cache/conftool/dbconfig/20220330-205529-ladsgroup.json
* 20:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23898 and previous config saved to /var/cache/conftool/dbconfig/20220330-205521-ladsgroup.json
* 20:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 20:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23897 and previous config saved to /var/cache/conftool/dbconfig/20220330-204748-ladsgroup.json
* 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23896 and previous config saved to /var/cache/conftool/dbconfig/20220330-204016-ladsgroup.json
* 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23895 and previous config saved to /var/cache/conftool/dbconfig/20220330-203243-ladsgroup.json
* 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23894 and previous config saved to /var/cache/conftool/dbconfig/20220330-203035-ladsgroup.json
* 20:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23893 and previous config saved to /var/cache/conftool/dbconfig/20220330-203028-ladsgroup.json
* 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23892 and previous config saved to /var/cache/conftool/dbconfig/20220330-202511-ladsgroup.json
* 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23891 and previous config saved to /var/cache/conftool/dbconfig/20220330-201522-ladsgroup.json
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23890 and previous config saved to /var/cache/conftool/dbconfig/20220330-201006-ladsgroup.json
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23889 and previous config saved to /var/cache/conftool/dbconfig/20220330-200236-marostegui.json
* 20:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 20:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
* 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23888 and previous config saved to /var/cache/conftool/dbconfig/20220330-200229-marostegui.json
* 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23887 and previous config saved to /var/cache/conftool/dbconfig/20220330-200017-ladsgroup.json
* 19:56 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test-eqiad cluster: Reboot kafka nodes
* 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23886 and previous config saved to /var/cache/conftool/dbconfig/20220330-194723-marostegui.json
* 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23885 and previous config saved to /var/cache/conftool/dbconfig/20220330-194512-ladsgroup.json
* 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23884 and previous config saved to /var/cache/conftool/dbconfig/20220330-193218-marostegui.json
* 19:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23883 and previous config saved to /var/cache/conftool/dbconfig/20220330-192355-ladsgroup.json
* 19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 19:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23882 and previous config saved to /var/cache/conftool/dbconfig/20220330-192347-ladsgroup.json
* 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 ([[phab:T298557|T298557]])', diff saved to https://phabricator.wikimedia.org/P23881 and previous config saved to /var/cache/conftool/dbconfig/20220330-191713-marostegui.json
* 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23880 and previous config saved to /var/cache/conftool/dbconfig/20220330-190842-ladsgroup.json
* 18:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23879 and previous config saved to /var/cache/conftool/dbconfig/20220330-185337-ladsgroup.json
* 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23878 and previous config saved to /var/cache/conftool/dbconfig/20220330-184458-ladsgroup.json
* 18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
* 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23877 and previous config saved to /var/cache/conftool/dbconfig/20220330-184445-ladsgroup.json
* 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23876 and previous config saved to /var/cache/conftool/dbconfig/20220330-183832-ladsgroup.json
* 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23875 and previous config saved to /var/cache/conftool/dbconfig/20220330-182940-ladsgroup.json
* 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P23874 and previous config saved to /var/cache/conftool/dbconfig/20220330-182537-ladsgroup.json
* 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 18:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 18:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23873 and previous config saved to /var/cache/conftool/dbconfig/20220330-181435-ladsgroup.json
* 18:11 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test-eqiad cluster: Reboot kafka nodes
* 18:08 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
* 18:03 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
* 18:01 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
* 18:00 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host zookeeper-test1002.eqiad.wmnet
* 18:00 razzi@cumin1001: START - Cookbook sre.hosts
* 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:02 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:10 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
* 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2002.codfw
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 14 hosts with reason: Maintenance
* 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 14 hosts with reason: Maintenance
* 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 11:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22287 and previous config saved to /var/cache/conftool/dbconfig/20220310-110253-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22286 and previous config saved to /var/cache/conftool/dbconfig/20220310-105807-marostegui.json
* 10:48 jbond: re-enable puppet fleet wide
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22285 and previous config saved to /var/cache/conftool/dbconfig/20220310-104748-marostegui.json
* 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 10:44 akosiaris: reboot rdb2009 for upgrades
* 10:44 jbond: disable puppet fleet wide
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22284 and previous config saved to /var/cache/conftool/dbconfig/20220310-104302-marostegui.json
* 10:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2010.codfw.wmnet with OS bullseye
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22283 and previous config saved to /var/cache/conftool/dbconfig/20220310-103243-marostegui.json
* 10:30 moritzm: failover ganeti master for drmrs/B13 to ganeti6004
* 10:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
* 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22282 and previous config saved to /var/cache/conftool/dbconfig/20220310-102757-marostegui.json
* 10:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
* 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22281 and previous config saved to /var/cache/conftool/dbconfig/20220310-101738-marostegui.json
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22280 and previous config saved to /var/cache/conftool/dbconfig/20220310-101133-marostegui.json
* 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22279 and previous config saved to /var/cache/conftool/dbconfig/20220310-101125-marostegui.json
* 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2010.codfw.wmnet with OS bullseye
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22278 and previous config saved to /var/cache/conftool/dbconfig/20220310-095620-marostegui.json
* 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2009.codfw.wmnet with OS bullseye
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22277 and previous config saved to /var/cache/conftool/dbconfig/20220310-094115-marostegui.json
* 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
* 09:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22276 and previous config saved to /var/cache/conftool/dbconfig/20220310-092742-marostegui.json
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T300775|T300775]])', diff saved to https://phabricator.wikimedia.org/P22275 and previous config saved to /var/cache/conftool/dbconfig/20220310-092735-marostegui.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22274 and previous config saved to /var/cache/conftool/dbconfig/20220310-092610-marostegui.json
* 09:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22273 and previous config saved to /var/cache/conftool/dbconfig/20220310-091807-marostegui.json
* 09:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T298294|T298294]])', diff saved to https://phabricator.wikimedia.org/P22272 and previous config saved to /var/cache/conftool/dbconfig/20220310-091759-marostegui.json
* 09:16 moritzm: failover ganeti master for drmrs/B12 to ganeti6003
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22271 and previous config saved to /var/cache/conftool/dbconfig/20220310-091230-marostegui.json
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 09:04 jmm@cumin2002:


== 2022-03-01 ==
== 2023-02-01 ==
* 22:51 inflatador: [[phab:T276198|T276198]] reenabled puppet on elastic1052.eqiad.wmnet
* 23:45 zabe@deploy1002: Finished scap: Backport for [[gerrit:885908{{!}}Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004)]] (duration: 08m 07s)
* 22:37 inflatador: [[phab:T276198|T276198]] rebooting elastic1052.eqiad.wmnet to test failure condition
* 23:39 zabe@deploy1002: zabe: Backport for [[gerrit:885908{{!}}Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 22:33 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
* 23:37 zabe@deploy1002: Started scap: Backport for [[gerrit:885908{{!}}Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004)]]
* 22:33 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
* 23:31 rzl@cumin2002: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P43574 and previous config saved to /var/cache/conftool/dbconfig/20230201-233140-rzl.json
* 22:
* 23:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
* 23:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
* 23:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.


==Archives==
==Archives ==
See [[Server Admin Log/Archives]].
See [[Server Admin Log/Archives]].
<noinclude>
<noinclude>

Latest revision as of 00:35, 3 February 2023

2023-02-03

  • 00:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye

2023-02-02

  • 22:58 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1080.eqiad.wmnet with OS bullseye
  • 22:15 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1079.eqiad.wmnet
  • 22:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS bullseye
  • 22:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye
  • 22:00 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1078.eqiad.wmnet
  • 21:58 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_comment everywhere (T233004) (duration: 07m 58s)
  • 21:52 zabe@deploy1002: zabe: Backport for Stop writing to cuc_comment everywhere (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:50 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_comment everywhere (T233004)
  • 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1078.eqiad.wmnet with OS bullseye
  • 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 21:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 21:30 brennen: end of utc late backport & config window
  • 21:30 brennen@deploy1002: Finished scap: Backport for Enable client preferences everywhere (T327979) (duration: 11m 14s)
  • 21:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 21:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS bullseye
  • 21:22 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
  • 21:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1077.eqiad.wmnet with OS bullseye
  • 21:21 brennen@deploy1002: brennen and nray: Backport for Enable client preferences everywhere (T327979) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 21:19 brennen@deploy1002: Started scap: Backport for Enable client preferences everywhere (T327979)
  • 21:18 brennen@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason everywhere (T233004) (duration: 12m 02s)
  • 21:07 brennen@deploy1002: brennen and dreamyjazz: Backport for Disable write old for CheckUserLog reason everywhere (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:06 brennen@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason everywhere (T233004)
  • 20:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS bullseye
  • 20:59 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1078.eqiad.wmnet with OS bullseye
  • 20:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 20:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 20:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS bullseye
  • 20:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1077.eqiad.wmnet with OS bullseye
  • 20:23 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.3-1+deb11u1_amd64.changes # T328280
  • 20:21 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.3-1_amd64.changes # T328280
  • 20:11 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text everywhere (T233004) (duration: 09m 39s)
  • 20:03 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:02 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2037.codfw.wmnet
  • 20:01 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text everywhere (T233004)
  • 19:55 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
  • 19:54 ryankemper: T328674 [Elastic] With puppet disabled on elastic* fleet, `ryankemper@elastic2037:~$ sudo run-puppet-agent --force` to verify changes in https://gerrit.wikimedia.org/r/886055
  • 19:30 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.21 refs T325584
  • 19:28 zabe@deploy1002: say aborted: (duration: 00m 03s)
  • 18:42 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_comment in group1 wikis (T233004) (duration: 08m 19s)
  • 18:36 zabe@deploy1002: zabe: Backport for Stop writing to cuc_comment in group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 18:34 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_comment in group1 wikis (T233004)
  • 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Production (gitlab1004) to 15.7.6-ce.0
  • 18:08 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2043.codfw.wmnet with OS bullseye
  • 18:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1037.eqiad.wmnet with OS bullseye
  • 17:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2043.codfw.wmnet with reason: host reimage
  • 17:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2043.codfw.wmnet with reason: host reimage
  • 17:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 17:45 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 17:33 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2043.codfw.wmnet with OS bullseye
  • 17:32 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1037.eqiad.wmnet with OS bullseye
  • 17:29 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Production (gitlab1004) to 15.7.6-ce.0
  • 17:12 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 17:12 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:53 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 16:52 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 16:51 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 16:50 dancy@deploy1002: Installation of scap version "4.34.0" completed for 561 hosts
  • 16:50 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 16:50 dancy@deploy1002: Installing scap version "4.34.0" for 561 hosts
  • 16:50 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:49 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:48 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:47 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 16:46 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 16:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2007.codfw.wmnet
  • 16:18 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 16:17 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 16:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2007.codfw.wmnet
  • 16:17 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 16:16 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 16:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:15 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:10 volans: uploaded python3-wmflib_1.2.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica gitlab2002 to 15.7.6-ce.0
  • 15:40 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@e38efa6] (releasing): (no justification provided) (duration: 07m 01s)
  • 15:38 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:37 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:35 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:35 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:34 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica gitlab2002 to 15.7.6-ce.0
  • 15:33 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@e38efa6] (releasing): (no justification provided)
  • 15:24 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti3004
  • 15:17 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti3004
  • 15:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2006.codfw.wmnet
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004 was renamed as ganeti4004 - jmm@cumin2002"
  • 15:02 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004 was renamed as ganeti4004 - jmm@cumin2002"
  • 15:00 vgutierrez: rolling restart of varnish in cache::text - T315676
  • 14:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2006.codfw.wmnet
  • 14:55 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 14:45 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 14:39 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2005.codfw.wmnet
  • 14:29 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 14:25 moritzm: installing containerd security updates on codfw k8s nodes
  • 14:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2005.codfw.wmnet
  • 13:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=ats-be
  • 13:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=cdn
  • 13:10 kharlan:: Deployed security patch for T328643
  • 13:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1076.eqiad.wmnet with OS bullseye
  • 13:04 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:03 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:03 kharlan:: Deployed security patch for T328643
  • 13:02 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2004.codfw.wmnet
  • 13:00 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2004.codfw.wmnet
  • 12:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 12:47 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:46 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 12:42 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:42 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:39 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:29 btullis@deploy1002: Finished deploy [analytics/superset/deploy@5175ad7]: Production deployment for numpy downgrade (duration: 00m 42s)
  • 12:29 claime: Work ongoing on m2 and m3
  • 12:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2003.codfw.wmnet
  • 12:29 btullis@deploy1002: Started deploy [analytics/superset/deploy@5175ad7]: Production deployment for numpy downgrade
  • 12:23 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1076.eqiad.wmnet with OS bullseye
  • 12:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2003.codfw.wmnet
  • 12:08 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:08 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:46 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:42 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:41 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:41 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:40 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:39 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:38 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix | tee T328634-namespaceDupes-4.out # T328634 – made some progress then errored out again
  • 11:32 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix --add-prefix=T328634/ | tee T328634-namespaceDupes-3.out # T328634 – seemed to finish the first 20 pages and then go into an infinite loop, I Ctrl+Ced it
  • 11:28 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix --add-prefix=T328634/ | tee T328634-namespaceDupes-2.out # T328634 – another error but made more progress
  • 11:23 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix | tee T328634-namespaceDupes.out # T328634 – failed quickly, details in task
  • 11:22 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 11:22 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:02 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2002.codfw.wmnet
  • 10:19 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2002.codfw.wmnet
  • 10:17 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:11 moritzm: restarting FPM on mw canaries to pick up tiff security updates
  • 10:04 moritzm: installing tiff security updates
  • 09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2001.codfw.wmnet
  • 09:55 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 09:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 09:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2001.codfw.wmnet
  • 09:40 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 09:40 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 398143
  • 09:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 398143
  • 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica gitlab1004 to 15.7.6
  • 09:13 apergos: UTC morning backport and config training window done
  • 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 09:12 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 09:11 elukey: roll restart of eventgate-main pods in wikikube eqiad/codfw to pick up new stream configs - T328576
  • 08:57 ariel@deploy1002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630) (duration: 10m 56s)
  • 08:48 ariel@deploy1002: ariel and aishik: Backport for Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:46 ariel@deploy1002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630)
  • 08:39 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica gitlab1004 to 15.7.6
  • 08:37 tgr@deploy1002: Finished scap: Backport for campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370), campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370) (duration: 14m 26s)
  • 08:27 tgr@deploy1002: tgr: Backport for campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370), campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:23 tgr@deploy1002: Started scap: Backport for campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370), campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370)
  • 06:17 kart_: Updated cxserver to 2023-02-02-004918-production (T129470, T172035, T327842)
  • 06:16 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:13 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:09 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:00 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet
  • 03:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5024.eqsin.wmnet with OS bullseye
  • 03:21 ejegg: payments-wiki upgraded from f20a2208 to 53d1a58d
  • 02:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
  • 02:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
  • 02:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 02:14 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5024.eqsin.wmnet with OS bullseye
  • 01:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 01:55 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
  • 01:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5023.eqsin.wmnet with OS bullseye
  • 01:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 01:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
  • 01:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1075.eqiad.wmnet with OS bullseye
  • 01:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 01:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 01:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 01:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 01:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1075.eqiad.wmnet with OS bullseye
  • 00:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS bullseye
  • 00:06 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet
  • 00:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5022.eqsin.wmnet with OS bullseye

2023-02-01

  • 23:45 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004) (duration: 08m 07s)
  • 23:39 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 23:37 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004)
  • 23:31 rzl@cumin2002: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P43574 and previous config saved to /var/cache/conftool/dbconfig/20230201-233140-rzl.json
  • 23:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 23:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 23:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 23:17 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.21 refs T325584 (duration: 06m 57s)
  • 23:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21 refs T325584
  • 23:01 zabe@deploy1002: Finished scap: Backport for CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601) (duration: 07m 45s)
  • 22:55 zabe@deploy1002: zabe: Backport for CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 22:53 zabe@deploy1002: Started scap: Backport for CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601)
  • 22:49 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_comment_id in group0 wikis (T233004) (duration: 13m 03s)
  • 22:47 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 22:40 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5022.eqsin.wmnet with OS bullseye
  • 22:38 zabe@deploy1002: zabe: Backport for Stop writing to cuc_comment_id in group0 wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:36 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_comment_id in group0 wikis (T233004)
  • 22:32 kindrobot: close UTC late backport window
  • 22:31 kindrobot@deploy1002: Finished scap: Backport for Enable client preferences for group1 (T327979) (duration: 10m 37s)
  • 22:22 kindrobot@deploy1002: nray and kindrobot: Backport for Enable client preferences for group1 (T327979) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:21 kindrobot@deploy1002: Started scap: Backport for Enable client preferences for group1 (T327979)
  • 22:14 kindrobot@deploy1002: Finished scap: Backport for Enable Linter write namespace, tag and template for all wikis (T299612) (duration: 18m 14s)
  • 21:57 kindrobot@deploy1002: kindrobot and sbailey: Backport for Enable Linter write namespace, tag and template for all wikis (T299612) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:57 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 21:56 kindrobot@deploy1002: Started scap: Backport for Enable Linter write namespace, tag and template for all wikis (T299612)
  • 21:53 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 21:52 kindrobot@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason on group 0 (T233004) (duration: 14m 53s)
  • 21:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 21:39 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 21:39 kindrobot@deploy1002: dreamyjazz and kindrobot: Backport for Disable write old for CheckUserLog reason on group 0 (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:37 kindrobot@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason on group 0 (T233004)
  • 21:32 kindrobot@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318) (duration: 13m 56s)
  • 21:26 eevans@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 21:26 eevans@puppetmaster1001: conftool action : get/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 21:26 eevans@puppetmaster1001: conftool action : get/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 21:24 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 21:20 kindrobot@deploy1002: arlolra and kindrobot: Backport for Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 21:18 kindrobot@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318)
  • 21:14 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
  • 21:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3065.esams.wmnet with OS bullseye
  • 21:03 kindrobot: start UTC late backport deployment window
  • 21:02 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 20:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3065.esams.wmnet with reason: host reimage
  • 20:44 eevans@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
  • 20:43 urandom: depooling sessionstore —codfw— in preparation for Cassandra restarts — T327675
  • 20:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3065.esams.wmnet with reason: host reimage
  • 20:40 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3064.esams.wmnet
  • 20:38 eevans@puppetmaster1001: conftool action : get/pooled; selector: dnsdisc=$SERVICE,name=$DC
  • 20:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3064.esams.wmnet with OS bullseye
  • 20:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3065.esams.wmnet with OS bullseye
  • 20:21 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3063.esams.wmnet
  • 20:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3064.esams.wmnet with reason: host reimage
  • 20:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3063.esams.wmnet with OS bullseye
  • 20:08 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3064.esams.wmnet with reason: host reimage
  • 20:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5031.eqsin.wmnet,service=ats-be
  • 20:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5031.eqsin.wmnet,service=cdn
  • 20:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5031.eqsin.wmnet with OS bullseye
  • 19:53 dancy: The train is blocked on T328601
  • 19:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS bullseye
  • 19:49 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20 refs T325584 (duration: 06m 36s)
  • 19:49 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet
  • 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS bullseye
  • 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
  • 19:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
  • 19:42 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20 refs T325584
  • 19:41 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-be
  • 19:41 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=cdn
  • 19:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
  • 19:33 dancy@deploy1002: deploy-promote aborted: (duration: 11m 58s)
  • 19:33 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.21 refs T325584 (duration: 03m 38s)
  • 19:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
  • 19:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21 refs T325584
  • 19:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
  • 19:26 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 19:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS bullseye
  • 19:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
  • 19:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 19:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS bullseye
  • 19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 19:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS bullseye
  • 19:02 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3060.esams.wmnet
  • 19:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3060.esams.wmnet with OS bullseye
  • 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 18:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 18:55 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
  • 18:55 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5031.eqsin.wmnet with OS bullseye
  • 18:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 18:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
  • 18:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5031.eqsin.wmnet with OS bullseye
  • 18:39 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts puppetmaster2003.codfw.wmnet
  • 18:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 18:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
  • 18:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 18:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS bullseye
  • 18:31 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3059.esams.wmnet
  • 18:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3059.esams.wmnet with OS bullseye
  • 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 18:29 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster2003.codfw.wmnet
  • 18:29 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5021.eqsin.wmnet with OS bullseye
  • 18:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 18:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp1075.eqiad.wmnet with reason: downtimed for idrac firmware testing
  • 18:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp1075.eqiad.wmnet with reason: downtimed for idrac firmware testing
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5030.eqsin.wmnet,service=ats-be
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5030.eqsin.wmnet,service=cdn
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=ats-be
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=cdn
  • 18:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3060.esams.wmnet with OS bullseye
  • 18:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3058.esams.wmnet
  • 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3058.esams.wmnet with OS bullseye
  • 18:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5030.eqsin.wmnet with OS bullseye
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43573 and previous config saved to /var/cache/conftool/dbconfig/20230201-181036-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43572 and previous config saved to /var/cache/conftool/dbconfig/20230201-181031-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43571 and previous config saved to /var/cache/conftool/dbconfig/20230201-181024-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43570 and previous config saved to /var/cache/conftool/dbconfig/20230201-181016-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43569 and previous config saved to /var/cache/conftool/dbconfig/20230201-181011-root.json
  • 18:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 18:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43568 and previous config saved to /var/cache/conftool/dbconfig/20230201-175531-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43567 and previous config saved to /var/cache/conftool/dbconfig/20230201-175526-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43566 and previous config saved to /var/cache/conftool/dbconfig/20230201-175519-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43565 and previous config saved to /var/cache/conftool/dbconfig/20230201-175511-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43564 and previous config saved to /var/cache/conftool/dbconfig/20230201-175506-root.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43563 and previous config saved to /var/cache/conftool/dbconfig/20230201-175446-root.json
  • 17:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 17:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 17:41 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3059.esams.wmnet with OS bullseye
  • 17:40 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3057.esams.wmnet
  • 17:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3057.esams.wmnet with OS bullseye
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43562 and previous config saved to /var/cache/conftool/dbconfig/20230201-174026-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43561 and previous config saved to /var/cache/conftool/dbconfig/20230201-174021-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43560 and previous config saved to /var/cache/conftool/dbconfig/20230201-174015-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43559 and previous config saved to /var/cache/conftool/dbconfig/20230201-174007-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43558 and previous config saved to /var/cache/conftool/dbconfig/20230201-174001-root.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43557 and previous config saved to /var/cache/conftool/dbconfig/20230201-173941-root.json
  • 17:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
  • 17:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43555 and previous config saved to /var/cache/conftool/dbconfig/20230201-172521-root.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43554 and previous config saved to /var/cache/conftool/dbconfig/20230201-172516-root.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43553 and previous config saved to /var/cache/conftool/dbconfig/20230201-172510-root.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43552 and previous config saved to /var/cache/conftool/dbconfig/20230201-172502-root.json
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43551 and previous config saved to /var/cache/conftool/dbconfig/20230201-172456-root.json
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43550 and previous config saved to /var/cache/conftool/dbconfig/20230201-172436-root.json
  • 17:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3058.esams.wmnet with OS bullseye
  • 17:22 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3056.esams.wmnet
  • 17:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS bullseye
  • 17:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5019.eqsin.wmnet with OS bullseye
  • 17:15 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43549 and previous config saved to /var/cache/conftool/dbconfig/20230201-171016-root.json
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43548 and previous config saved to /var/cache/conftool/dbconfig/20230201-171011-root.json
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43547 and previous config saved to /var/cache/conftool/dbconfig/20230201-171005-root.json
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43546 and previous config saved to /var/cache/conftool/dbconfig/20230201-170957-root.json
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43545 and previous config saved to /var/cache/conftool/dbconfig/20230201-170951-root.json
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43544 and previous config saved to /var/cache/conftool/dbconfig/20230201-170931-root.json
  • 16:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:57 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
  • 16:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43543 and previous config saved to /var/cache/conftool/dbconfig/20230201-165512-root.json
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43542 and previous config saved to /var/cache/conftool/dbconfig/20230201-165506-root.json
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43541 and previous config saved to /var/cache/conftool/dbconfig/20230201-165500-root.json
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43540 and previous config saved to /var/cache/conftool/dbconfig/20230201-165452-root.json
  • 16:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43539 and previous config saved to /var/cache/conftool/dbconfig/20230201-165446-root.json
  • 16:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3057.esams.wmnet with OS bullseye
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43538 and previous config saved to /var/cache/conftool/dbconfig/20230201-165426-root.json
  • 16:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
  • 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43536 and previous config saved to /var/cache/conftool/dbconfig/20230201-164007-root.json
  • 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43535 and previous config saved to /var/cache/conftool/dbconfig/20230201-164002-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43534 and previous config saved to /var/cache/conftool/dbconfig/20230201-163955-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43533 and previous config saved to /var/cache/conftool/dbconfig/20230201-163947-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43532 and previous config saved to /var/cache/conftool/dbconfig/20230201-163941-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43531 and previous config saved to /var/cache/conftool/dbconfig/20230201-163921-root.json
  • 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:33 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS bullseye
  • 16:31 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
  • 16:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 16:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 16:25 jynus: reloaded apache on mailman
  • 16:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:14 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 15:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5019.eqsin.wmnet with OS bullseye
  • 15:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 14:56 sukhe: cp1075.eqiad.wmnet for idrac firmware upgrade testing
  • 14:55 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 14:55 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=cdn
  • 14:52 awight: EU deployment window complete
  • 14:48 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:48 awight@deploy1002: Finished scap: Backport for wmf-config: add new revision-score streams for EventGate main (T317768) (duration: 08m 25s)
  • 14:47 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:41 awight@deploy1002: elukey and awight: Backport for wmf-config: add new revision-score streams for EventGate main (T317768) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2136 db2158 db2157 es2026 db2106 db2146 T327404', diff saved to https://phabricator.wikimedia.org/P43530 and previous config saved to /var/cache/conftool/dbconfig/20230201-144152-root.json
  • 14:40 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:40 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:40 awight@deploy1002: Started scap: Backport for wmf-config: add new revision-score streams for EventGate main (T317768)
  • 14:39 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:39 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:37 awight@deploy1002: Finished scap: Backport for Add cswiki to desktop-improvements group. (T328154) (duration: 09m 22s)
  • 14:29 awight@deploy1002: jdrewniak and awight: Backport for Add cswiki to desktop-improvements group. (T328154) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:28 awight@deploy1002: Started scap: Backport for Add cswiki to desktop-improvements group. (T328154)
  • 14:26 awight@deploy1002: Finished scap: Backport for Squashed diff to catch up to master (duration: 09m 07s)
  • 14:19 awight@deploy1002: awight and mlitn: Backport for Squashed diff to catch up to master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:17 awight@deploy1002: Started scap: Backport for Squashed diff to catch up to master
  • 14:11 awight@deploy1002: backport aborted: (duration: 06m 09s)
  • 14:11 awight@deploy1002: sync-world aborted: Backport for Squashed diff to catch up to master (duration: 03m 36s)
  • 14:09 awight@deploy1002: mlitn and awight: Backport for Squashed diff to catch up to master synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:07 awight@deploy1002: Started scap: Backport for Squashed diff to catch up to master
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3005.wikimedia.org
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 14:06 moritzm: updating perf on Bullseye hosts
  • 14:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3005.wikimedia.org
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5002.wikimedia.org
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:47 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5002.wikimedia.org
  • 13:21 moritzm: installing curl security updates on bullseye
  • 13:00 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:59 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2003.codfw.wmnet
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2003.codfw.wmnet
  • 12:16 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for testvm2002.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
  • 12:15 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for testvm2002.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
  • 11:29 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part III (T308932) (duration: 06m 43s)
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:22 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@e1ca693] (codfw): Allow stylesheets through CSP (duration: 01m 45s)
  • 11:21 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part II (T308932) (duration: 07m 04s)
  • 11:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:20 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@e1ca693] (codfw): Allow stylesheets through CSP
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 11:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@e1ca693] (eqiad): Allow stylesheets through CSP (duration: 00m 51s)
  • 11:16 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@e1ca693] (eqiad): Allow stylesheets through CSP
  • 11:14 ladsgroup@deploy1002: Synchronized wmf-config/ext-CirrusSearch.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part I (T308932) (duration: 07m 04s)
  • 11:01 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a8840b0] (duration: 01m 18s)
  • 11:00 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a8840b0]
  • 10:59 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0] (thin): Regular analytics weekly train THIN [analytics/refinery@a8840b0] (duration: 00m 05s)
  • 10:59 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0] (thin): Regular analytics weekly train THIN [analytics/refinery@a8840b0]
  • 10:58 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0]: Regular analytics weekly train [analytics/refinery@a8840b0] (duration: 04m 29s)
  • 10:54 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0]: Regular analytics weekly train [analytics/refinery@a8840b0]
  • 10:52 steve_munene: Deploying refinery for ops week
  • 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:42 zabe: start running migrateRevisionCommentTemp in remaining sections (for now except s3) in screens # T275246
  • 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
  • 10:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
  • 10:01 godog: upgrade grafana to 8.5.20 on cloudmetrics* - T328405
  • 09:57 godog: upgrade grafana to 8.5.20 on grafana1002 - T328405
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host krb2002.codfw.wmnet with OS bullseye
  • 09:47 godog: upgrade grafana to 8.5.20 on grafana2001 - T328405
  • 09:15 urbanecm: Clean sign up throttle for IP 195.113.145.2 (via resetAuthenticationThrottle.php; T328521)
  • 09:14 urbanecm@deploy1002: Finished scap: Backport for Add new throttle rule (T328521) (duration: 07m 24s)
  • 09:07 urbanecm@deploy1002: Started scap: Backport for Add new throttle rule (T328521)
  • 09:06 urbanecm@deploy1002: backport aborted: (duration: 00m 01s)
  • 09:05 ladsgroup@deploy1002: Finished scap: Backport for Create additional namespaces on shn.wikibooks (T327850) (duration: 15m 06s)
  • 08:54 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 08:54 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:52 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Create additional namespaces on shn.wikibooks (T327850) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 08:50 ladsgroup@deploy1002: Started scap: Backport for Create additional namespaces on shn.wikibooks (T327850)
  • 08:49 ladsgroup@deploy1002: Finished scap: Backport for Add a wordmark to trwiktionary (T328499) (duration: 08m 05s)
  • 08:45 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=k8s-ingress-staging
  • 08:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=k8s-ingress-staging
  • 08:42 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Add a wordmark to trwiktionary (T328499) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:41 ladsgroup@deploy1002: Started scap: Backport for Add a wordmark to trwiktionary (T328499)
  • 08:40 ladsgroup@deploy1002: Finished scap: Backport for Add mobile wordmark to cswiktionary (T328357) (duration: 12m 26s)
  • 08:29 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Add mobile wordmark to cswiktionary (T328357) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:27 ladsgroup@deploy1002: Started scap: Backport for Add mobile wordmark to cswiktionary (T328357)
  • 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:27 ladsgroup@deploy1002: Finished scap: Backport for Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623) (duration: 09m 42s)
  • 08:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
  • 08:19 ladsgroup@deploy1002: ladsgroup and krinkle: Backport for Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:17 ladsgroup@deploy1002: Started scap: Backport for Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623)
  • 08:14 ladsgroup@deploy1002: Finished scap: Backport for Remove unused eventlogging_RUMSpeedIndex stream (T286700) (duration: 10m 15s)
  • 08:06 ladsgroup@deploy1002: phedenskog and ladsgroup: Backport for Remove unused eventlogging_RUMSpeedIndex stream (T286700) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:05 moritzm: installing libarchive security updates
  • 08:04 ladsgroup@deploy1002: Started scap: Backport for Remove unused eventlogging_RUMSpeedIndex stream (T286700)
  • 08:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 55821
  • 07:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 55821
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T310011)', diff saved to https://phabricator.wikimedia.org/P43524 and previous config saved to /var/cache/conftool/dbconfig/20230201-073348-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P43523 and previous config saved to /var/cache/conftool/dbconfig/20230201-071841-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P43522 and previous config saved to /var/cache/conftool/dbconfig/20230201-070335-ladsgroup.json
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T310011)', diff saved to https://phabricator.wikimedia.org/P43521 and previous config saved to /var/cache/conftool/dbconfig/20230201-064828-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T310011)', diff saved to https://phabricator.wikimedia.org/P43520 and previous config saved to /var/cache/conftool/dbconfig/20230201-064311-ladsgroup.json
  • 06:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 00:38 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3055.esams.wmnet
  • 00:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3055.esams.wmnet with OS bullseye
  • 00:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 00:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 00:02 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
  • 00:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3054.esams.wmnet with OS bullseye

Archives

See Server Admin Log/Archives.