You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(mutante: gitlab2001 - fdisk /dev/vdb (g, w) (create partition table), (n, w) (create partition) ; mkfs.ext4 /dev/vdb1 (create filesystem); systemctl reset-failed (fix Icinga alert); mkdir /mnt/gitlab-backup; mount /dev/vdb1 /mnt/gitlab-backup ; blkid (get UUID); edit /etc/fstab and insert "UUID=c5235682-ac21-46a9-85ee-9603f694a6a4 /mnt/gitlab-backup ext4 errors=remount-ro 0 2" T274463)
imported>Stashbot
(mutante: DNS - new project language 'kcg'. 'Tyap is a regionally important dialect cluster of Plateau languages in Nigeria's Middle Belt, named after its prestige dialect. It is also known by its Hausa exonym as Katab or Kataf.' T305279)
Line 1: Line 1:
== 2022-03-31 ==
== 2022-04-01 ==
* 23:45 mutante: gitlab2001 - fdisk /dev/vdb (g, w) (create partition table), (n, w) (create partition) ; mkfs.ext4 /dev/vdb1 (create filesystem); systemctl reset-failed (fix Icinga alert); mkdir /mnt/gitlab-backup; mount /dev/vdb1 /mnt/gitlab-backup ; blkid (get UUID);  edit /etc/fstab and insert "UUID=c5235682-ac21-46a9-85ee-{{Gerrit|9603f694a6a4}} /mnt/gitlab-backup ext4 errors=remount-ro 0 2" [[phab:T274463|T274463]]
* 23:25 mutante: DNS - new project language 'kcg'. 'Tyap is a regionally important dialect cluster of Plateau languages in Nigeria's Middle Belt, named after its prestige dialect. It is also known by its Hausa exonym as Katab or Kataf.' [[phab:T305279|
* 23:27 mutante: gitlab2001 - rebooted on ganeti level (needed when adding new virtual hardware), then ran into the usual bug [[phab:T272555|T272555]] where you have to manually fix the interface in /etc/network/interfaces  [[phab:T274463|T274463]]
* 23:21 mutante: gitlab2001 (gitlab-replica.wikimedia.org) - rebooting to add new virtual disk [[phab:T274463|T274463]]
* 23:11 ejegg: updated payments-wiki from {{Gerrit|47d9bd27}} to {{Gerrit|6f888c28}}
* 23:01 bblack: esams->drmrs failover test begins - [[phab:T304089|T304089]]
* 22:34 moritzm: updated CAS to 6.4.6.2
* 22:28 mutante: ganeti - creating new 100G virtual disk on gitlab1001 [[phab:T274463|T274463]]
* 22:24 mutante: ganeti - creating new 100G virtual disk on gitlab2001 [[phab:T274463|T274463]]
* 22:16 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 22:03 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 22:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 21:51 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 21:48 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 21:40 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:19 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^(cp1075{{!}}cp1079{{!}}cp2035{{!}}cp3050{{!}}cp3051{{!}}cp3052{{!}}cp3054{{!}}cp4022{{!}}cp5013{{!}}cp5014{{!}}cp5015).*
* 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:17 bblack@cumin1001: conftool action : select; selector: name="^(cp1075{{!}}cp1079{{!}}cp2035{{!}}cp3050{{!}}cp3051{{!}}cp3052{{!}}cp3054{{!}}cp4022{{!}}cp5013{{!}}cp5014{{!}}cp5015).*"
* 21:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: [[gerrit:775876{{!}}Remove unused Flow config]] (duration: 00m 49s)
* 21:07 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp5012.eqsin.wmnet
* 21:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 21:06 thcipriani: utc late backport complete
* 21:03 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 20:59 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 bking@cumin1001: START - Cookbook sre.wdqs.reboot
* 20:56 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/GrowthExperiments/modules/ext.growthExperiments.Homepage.SuggestedEdits/MatchModeSelectWidget.less: Backport: [[gerrit:775371{{!}}Newcomer tasks: always align button and text to the right (T301825)]] (duration: 00m 50s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:49 thcipriani@deploy1002: Synchronized tests: Config (noop -- tests) (duration: 00m 50s)
* 20:47 thcipriani@deploy1002: Synchronized src/StaticSiteConfiguration.php: Config (noop -- comment change): [[gerrit:775427{{!}}phpcs: enable and fix PropertyDocumentation.MissingVar (T171115)]] (duration: 00m 50s)
* 20:46 thcipriani@deploy1002: Synchronized phpcs.xml: Config (noop): [[gerrit:775427{{!}}phpcs: enable and fix PropertyDocumentation.MissingVar (T171115)]] [[gerrit:775426{{!}}phpcs: rename test files to match class names (T171115)]] [[gerrit:775005{{!}}phpcs: enable rules that are already passing (T171115)]] (duration: 00m 49s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:40 mutante: reserving port 4017 for new k8s service request 'image-suggestions' [[phab:T304891|T304891]]
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:36 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:774500{{!}}Stop writing to $wmfLocalServices (T45956)]] (duration: 00m 50s)
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:29 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:774499{{!}}Migrate $wmfLocalServices to $wmgLocalServices (T45956)]] (duration: 00m 51s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2007.codfw.wmnet
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:22 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
* 20:22 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:774497{{!}}Start writing to $wmgLocalServices the same value as to $wmfLocalServices (T45956)]] (duration: 00m 50s)
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:21 mutante: contint2002 - reboot (insetup host)
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:18 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
* 20:17 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2007.codfw.wmnet
* 20:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
* 20:16 thcipriani@deploy1002: Synchronized wmf-config/PhpAutoPrepend.php: Config: [[gerrit:774019{{!}}Migrate $wmfServiceConfig to $wmgServiceConfig (T45956)]] (duration: 00m 50s)
* 20:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
* 20:12 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5001.eqsin.wmnet
* 20:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1075.eqiad.wmnet
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2376.codfw.wmnet
* 20:10 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2374.codfw.wmnet
* 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:09 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2272.codfw.wmnet
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:09 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2252.codfw.wmnet
* 20:08 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2271.codfw.wmnet
* 20:08 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2251.codfw.wmnet
* 20:07 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
* 20:07 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5001.eqsin.wmnet
* 20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5014.eqsin.wmnet
* 20:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2376.codfw.wmnet
* 20:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2374.codfw.wmnet
* 20:04 mutante: mw2271,mw2222 - canary appserver, rebooting
* 20:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2035.codfw.wmnet
* 20:04 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4005.ulsfo.wmnet
* 20:01 mutante: mw2251,mw2252 - canary appserver, rebooting
* 20:00 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4005.ulsfo.wmnet
* 19:59 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2272.codfw.wmnet
* 19:59 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
* 19:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2252.codfw.wmnet
* 19:57 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2251.codfw.wmnet
* 19:55 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
* 19:46 mutante: phab2001 - systemctl restart ssh-phab
* 19:45 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
* 19:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
* 19:43 rzl: Rolling-restarted zotero to un-wedge wedged pods with offscale high CPU
* 19:42 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: sync
* 19:42 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: sync
* 19:38 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2008.codfw.wmnet
* 19:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5014.eqsin.wmnet
* 19:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
* 19:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
* 19:28 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
* 19:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5015.eqsin.wmnet
* 19:26 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2008.codfw.wmnet
* 19:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
* 19:24 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
* 19:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
* 19:23 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
* 19:21 cwhite: remove openjdk-8-jre from eqiad logstash nodes [[phab:T301770|T301770]]
* 19:21 mutante: phab2001 - powercycling via mgmt
* 19:20 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
* 19:20 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
* 19:19 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
* 19:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
* 19:15 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
* 19:15 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
* 19:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
* 19:14 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
* 19:14 mutante: phab2001 - git-ssh.codfw - rebooting - might cause pybal alert
* 19:13 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5015.eqsin.wmnet
* 19:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4022.ulsfo.wmnet
* 19:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
* 19:09 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
* 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2035.codfw.wmnet
* 19:07 bblack@cumin1001: conftool action : set/pooled=yes; selector: cluster=ml_staging
* 19:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1075.eqiad.wmnet
* 19:07 bblack@cumin1001: conftool action : set/weight=1; selector: cluster=ml_staging
* 19:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5013.eqsin.wmnet
* 19:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
* 19:06 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5002.eqsin.wmnet
* 19:05 mutante: doc.wikimedia.org - short downtime due to maintenance, rebooting doc1001
* 19:02 mutante: testreduce1001 - needed manual nginx restart after reboot to make https://parsoid-rt-tests.wikimedia.org/ work again
* 19:01 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5002.eqsin.wmnet
* 19:00 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.1-1+deb11u1_source.changes
* 19:00 mutante: testreduce1001 - rebooting
* 18:59 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4006.ulsfo.wmnet
* 18:59 mutante: https://parsoid-rt-tests.wikimedia.org/ - short downtime due to maintenance
* 18:59 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4022.ulsfo.wmnet
* 18:56 mutante: scandium - rebooting
* 18:54 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4006.ulsfo.wmnet
* 18:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
* 18:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5013.eqsin.wmnet
* 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3054.esams.wmnet
* 18:50 mutante: mwdebug1001 - rebooting
* 18:49 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
* 18:43 duesen: removing /var/run/php/use-config-schema  from canaries mw1415, mw1438, and mw1448 to disable config schema loading ([[phab:T304460|T304460]])
* 18:41 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
* 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3054.esams.wmnet
* 18:36 mutante: gerrit-replica.wikimedia.org short downtime, rebooting gerrit2001
* 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:23 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/TimedMediaHandler/resources/ext.tmh.player.styles.less: Backport: [[gerrit:775443{{!}}Set noflip for css rule that needs it (T305156)]] (duration: 00m 51s)
* 18:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:20 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2009.codfw.wmnet
* 18:19 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@ba88f51]: 0.3.109 (duration: 07m 24s)
* 18:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns2001.wikimedia.org
* 18:13 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.109` on canary `wdqs1003`; proceeding to rest of fleet
* 18:11 ryankemper@deploy1002: Started deploy [wdqs/wdqs@ba88f51]: 0.3.109
* 18:11 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.109`. Pre-deploy tests passing on canary `wdqs1003`
* 18:08 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2009.codfw.wmnet
* 18:03 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
* 17:57 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
* 17:52 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host authdns2001.wikimedia.org
* 17:47 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns1001.wikimedia.org
* 17:41 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host authdns1001.wikimedia.org
* 17:37 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
* 17:31 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1001.wikimedia.org
* 17:30 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
* 17:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5003.eqsin.wmnet
* 17:25 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns1001.wikimedia.org
* 17:25 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2001.wikimedia.org
* 17:24 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5003.eqsin.wmnet
* 17:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet
* 17:17 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet
* 17:17 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
* 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Maint', diff saved to https://phabricator.wikimedia.org/P24019 and previous config saved to /var/cache/conftool/dbconfig/20220331-171724-ladsgroup.json
* 17:10 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
* 17:10 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2010.codfw.wmnet
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Maint', diff saved to https://phabricator.wikimedia.org/P24018 and previous config saved to /var/cache/conftool/dbconfig/20220331-170221-ladsgroup.json
* 16:58 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2010.codfw.wmnet
* 16:58 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
* 16:57 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6002.wikimedia.org
* 16:55 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2001.wikimedia.org
* 16:54 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
* 16:51 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
* 16:51 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns6002.wikimedia.org
* 16:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5002.wikimedia.org
* 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Maint', diff saved to https://phabricator.wikimedia.org/P24017 and previous config saved to /var/cache/conftool/dbconfig/20220331-164717-ladsgroup.json
* 16:47 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
* 16:47 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
* 16:42 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns5002.wikimedia.org
* 16:42 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4002.wikimedia.org
* 16:37 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
* 16:37 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5001.wikimedia.org
* 16:33 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns4002.wikimedia.org
* 16:33 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3002.wikimedia.org
* 16:33 duesen: creating /var/run/php/use-config-schema  on canaries mw1415, mw1438, and mw1448 to enable config schema loading ([[phab:T304460|T304460]])
* 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Maint', diff saved to https://phabricator.wikimedia.org/P24016 and previous config saved to /var/cache/conftool/dbconfig/20220331-163213-ladsgroup.json
* 16:28 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns5001.wikimedia.org
* 16:28 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6001.wikimedia.org
* 16:25 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3002.wikimedia.org
* 16:25 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1002.wikimedia.org
* 16:20 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns6001.wikimedia.org
* 16:19 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns1002.wikimedia.org
* 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: Maint', diff saved to https://phabricator.wikimedia.org/P24015 and previous config saved to /var/cache/conftool/dbconfig/20220331-161709-ladsgroup.json
* 16:17 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2002.wikimedia.org
* 16:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
* 16:11 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dns2002.wikimedia.org
* 16:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
* 15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:44 mmandere: pool cp6016 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
* 15:40 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 15:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:18 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 15:15 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
* 15:13 mmandere: pool cp5009 with HAProxy as TLS termination layer - [[phab:T290005|T290005]]
* 15:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:11 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:10 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 12 hosts with reason: reboot for update [[phab:T304938|T304938]]
* 15:10 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5009.eqsin.wmnet with OS buster
* 15:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on 12 hosts with reason: reboot for update [[phab:T304938|T304938]]
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on durum[1001-1002].eqiad.wmnet with reason: reboot for update [[phab:T304938|T304938]]
* 15:05 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on durum[1001-1002].eqiad.wmnet with reason: reboot for update [[phab:T304938|T304938]]
* 15:05 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:57 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
* 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh6002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh6002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh6001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh6001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh5002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh5002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh5001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh5001.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:50 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:47 mmandere: depool cp6016 for reimage - [[phab:T290005|T290005]]
* 14:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh4002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh4002.wikimedia.org with reason: reboot for kernel update [[phab:T304938|T304938]]
* 14:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts


==Archives==
==Archives==

Revision as of 23:25, 1 April 2022

2022-04-01

  • 23:25 mutante: DNS - new project language 'kcg'. 'Tyap is a regionally important dialect cluster of Plateau languages in Nigeria's Middle Belt, named after its prestige dialect. It is also known by its Hausa exonym as Katab or Kataf.' T305279
  • 23:08 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 23:08 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 22:04 bblack: esams re-pooled - T304089
  • 20:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:19 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:48 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
  • 19:47 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
  • 19:44 mutante: rebooting parsoid canary appservers - wtp1025, wtp1026, parse2001, parse2002
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].codfw.wmnet
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=parse200[1-2].eqiad.wmnet
  • 19:38 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=parse200[1-2].eqiad.wmnet
  • 19:37 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=wtp102[5-6].eqiad.wmnet
  • 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
  • 19:36 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1450.eqiad.wmnet
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=varnish-fe
  • 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-tls
  • 19:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet,service=ats-be
  • 19:16 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw144[7-9].eqiad.wmnet
  • 19:16 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
  • 19:01 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].eqiad.wmnet
  • 19:00 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp2036.codfw.wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1414.wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw141[4-8].wmnet
  • 19:00 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw1414.wmnet
  • 18:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw141[4-8].wmnet
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2036.codfw.wmnet
  • 13:05 dcausse: reseting jvmquake flag on all wdqs hosts
  • 12:52 dcausse: restarting blazegraph on wdqs1006 and resetting jvmquake warning flag
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 11:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
  • 10:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 10:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 10:47 vgutierrez: reboot acme-chief instances to catch up on kernel upgrades
  • 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6002.drmrs.wmnet
  • 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6002.drmrs.wmnet
  • 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir6001.drmrs.wmnet
  • 10:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir6001.drmrs.wmnet
  • 10:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5002.eqsin.wmnet
  • 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5002.eqsin.wmnet
  • 10:06 vgutierrez: vgutierrez@puppetmaster2001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
  • 10:04 vgutierrez: vgutierrez@puppetmaster1001:~$ sudo -i rm /var/run/confd-template/.ml-staging-ctrl*.err
  • 10:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5001.eqsin.wmnet
  • 09:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5001.eqsin.wmnet
  • 09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4002.ulsfo.wmnet
  • 09:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4002.ulsfo.wmnet
  • 09:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4001.ulsfo.wmnet
  • 09:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4001.ulsfo.wmnet
  • 09:35 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir3002.esams.wmnet
  • 09:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3002.esams.wmnet
  • 09:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3001.esams.wmnet
  • 09:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3001.esams.wmnet
  • 09:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet
  • 09:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet
  • 09:10 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ncredir2001.codfw.wmnet
  • 08:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2001.codfw.wmnet
  • 08:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1002.eqiad.wmnet
  • 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1002.eqiad.wmnet
  • 08:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1001.eqiad.wmnet
  • 08:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
  • 08:48 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ncredir1001.eqiad.wmnet
  • 08:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
  • 08:44 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 08:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:42 vgutierrez: rolling restart of ncredir instances to catch up on kernel upgrades
  • 06:54 XioNoX: traffic engineering in drmrs to prevent link saturation

Archives

See Server Admin Log/Archives.