You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Server Admin Log: Difference between revisions
Jump to navigation
Jump to search
imported>Stashbot (mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply) |
imported>Stashbot (zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in group1 wikis (T299954) (duration: 08m 00s)) |
||
(365 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
== 2023-05-30 == | |||
* 23:38 zabe@deploy1002: Finished scap: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]] (duration: 08m 00s) | |||
* 23:31 zabe@deploy1002: zabe: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 23:30 zabe@deploy1002: Started scap: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]] | |||
* 22:22 ejegg: civicrm upgraded from {{Gerrit|415aa7e5}} to {{Gerrit|5905a403}} | |||
* 21:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] (duration: 07m 48s) | |||
* 21:50 samtar@deploy1002: jforrester and samtar: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet | |||
* 21:48 samtar@deploy1002: Started scap: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] | |||
* 20:58 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:924569{{!}}Add WANCache to ParserOutputPageProperties::finalize | |||
== | == 2023-05-29 == | ||
* | * 15:19 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on | ||
* | * 15:19 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on | ||
* 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service | |||
* | * 14:18 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service | ||
* | * 13:57 vgutierrez@puppetmaster1001: conftool action : set/weight=10; selector: name=dbproxy.*,dc=eqiad | ||
* 11:25 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* | * 11:24 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | ||
* | * 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48618 and previous config saved to /var/cache/conftool/dbconfig/20230529-112242-root.json | ||
* | * 11:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | ||
* 22 | * 11:13 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | ||
* | * 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48617 and previous config saved to /var/cache/conftool/dbconfig/20230529-110737-root.json | ||
* | * 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48616 and previous config saved to /var/cache/conftool/dbconfig/20230529-105233-root.json | ||
* | * 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48615 and previous config saved to /var/cache/conftool/dbconfig/20230529-103728-root.json | ||
* | * 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48614 and previous config saved to /var/cache/conftool/dbconfig/20230529-102223-root.json | ||
* | * 10:07 vgutierrez: restarting pybal on lvs1018 | ||
* | * 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48612 and previous config saved to /var/cache/conftool/dbconfig/20230529-100719-root.json | ||
* | * 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync | ||
* | * 10:05 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync | ||
* | * 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync | ||
* | * 10:05 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync | ||
* | * 10:04 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync | ||
* | * 10:04 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync | ||
* | * 10:03 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync | ||
* 10:03 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync | |||
* | * 10:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply | ||
* | * 10:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply | ||
* | * 10:00 vgutierrez: restarting pybal on lvs1020 | ||
* | * 09:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply | ||
* | * 09:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply | ||
* 09:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply | |||
* | * 09:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply | ||
* | * 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48611 and previous config saved to /var/cache/conftool/dbconfig/20230529-095214-root.json | ||
* | * 09:52 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply | ||
* 09:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply | |||
* 09:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply | |||
* 09:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply | |||
* 09:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply | |||
* 09:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply | |||
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48610 and previous config saved to /var/cache/conftool/dbconfig/20230529-093709-root.json | |||
* 09:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply | |||
* 09:31 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply | |||
* 09:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply | |||
* 09:29 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply | |||
* 09:13 godog: start partial rollout of cadvisor to eqiad/codfw (~10%) [[phab:T108027|T108027]] | |||
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48609 and previous config saved to /var/cache/conftool/dbconfig/20230529-090216-root.json | |||
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48608 and previous config saved to /var/cache/conftool/dbconfig/20230529-084711-root.json | |||
* 08:45 godog: delete old raw blocks from thanos - [[phab:T337236|T337236]] | |||
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48607 and previous config saved to /var/cache/conftool/dbconfig/20230529-083206-root.json | |||
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48606 and previous config saved to /var/cache/conftool/dbconfig/20230529-081702-root.json | |||
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48604 and previous config saved to /var/cache/conftool/dbconfig/20230529-080157-root.json | |||
* 07:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 07:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48603 and previous config saved to /var/cache/conftool/dbconfig/20230529-074653-root.json | |||
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48602 and previous config saved to /var/cache/conftool/dbconfig/20230529-073148-root.json | |||
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48601 and previous config saved to /var/cache/conftool/dbconfig/20230529-071643-root.json | |||
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s2, s3, s5 [[phab:T337446|T337446]]', diff saved to https://phabricator.wikimedia.org/P48598 and previous config saved to /var/cache/conftool/dbconfig/20230529-051043-root.json | |||
== | == 2023-05-28 == | ||
* | * 13:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync | ||
* 13:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync | |||
* 13:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync | |||
* 13:16 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync | |||
* 06:12 marostegui: Change innodb_fast_shutdown to 0 on db1154 before downgrading [[phab:T337446|T337446]] | |||
* | |||
* | |||
* 13 | |||
* 06 | |||
== | == 2023-05-27 == | ||
* 21:40 Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 ([[phab:T337446|T337446]]) | |||
* 17:42 godog: silence systemd state alert flapping on stat1009 until monday | |||
* 00:03 tzatziki: removing 1 file for legal compliance | |||
* 21 | |||
* 17: | |||
* | |||
== 2023-05-26 == | |||
* 23:48 tzatziki: removing 2 files for legal compliance | |||
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:47 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | |||
== 2023-05-25 == | |||
* 22:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] (duration: 09m 14s) | |||
* 22:07 zabe@deploy1002: zabe and ladsgroup: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 22:05 zabe@deploy1002: Started scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] | |||
* 21:26 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) (duration: 00m 08s) | |||
* 21:25 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@77cf676 | |||
== 2023-05-24 == | |||
* 21:18 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] (duration: 09m 40s) | |||
* 21:10 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 21:08 urbanecm@deploy1002: | |||
== 2023-05-23 == | |||
* 23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done | |||
* 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance | |||
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance | |||
* 23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance | |||
* 23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - [[phab:T324659|T324659]] | |||
* 22:00 eileen: civicrm upgraded from {{Gerrit|11538e23}} to {{Gerrit|4251dfa1}} | |||
* 21:26 ejegg: payments-wiki upgraded from {{Gerrit|a7567c6a}} to {{Gerrit|e02bc7c5}} | |||
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d | |||
== 2023-05-22 == | |||
* 23:29 eileen: civicrm upgraded from {{Gerrit|cc9593d0}} to {{Gerrit|7eae24d5}} | |||
* 23:16 zabe@deploy1002: Finished scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] (duration: 06m 58s) | |||
* 23:11 zabe@deploy1002: zabe: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 23:09 zabe@deploy1002: Started scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] | |||
* | |||
== | == 2023-05-21 == | ||
* 07:45 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 07:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 07:43 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 07:42 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 07:41 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 07:40 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* | |||
* | |||
* | |||
* | |||
* | |||
* | |||
==Archives== | == 2023-05-20 == | ||
* 18:25 effie: restart varnish cp3061 | |||
* 16:39 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1018.eqiad.wmnet | |||
* 15:17 hoo@deploy1002: Finished scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] (duration: 08m 47s) | |||
* 15:10 hoo@deploy1002: hoo: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 15:08 hoo@deploy1002: Started scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] | |||
* 14:41 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1018.eqiad.wmnet | |||
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001" | |||
* 09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001" | |||
* 09:00 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
== 2023-05-19 == | |||
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001" | |||
* 21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001" | |||
* 21:19 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1495.eqiad.wmnet | |||
* 19:46 mutante: mw1469 - sudo pkill ffmpeg (per runbook) | |||
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1469.eqiad.wmnet | |||
* 19:45 mutante: depooled mw1469 from videoscaler, dedicating to just jobrunner | |||
* 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1469.eqiad.wmnet | |||
* 19:36 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) (duration: 00m 09s) | |||
* 19:36 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) | |||
* 16:55 mutante: mw2448 - scap pull - [[phab:T2334429|T2334429]] | |||
* 15:31 taavi@deploy1002: Finished scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] (duration: 22m 02s) | |||
* 15:21 taavi@deploy1002: legoktm and taavi: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 15:09 taavi@deploy1002: Started scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] | |||
* 15:06 legoktm@deploy1002: Finished scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] (duration: 09m 46s) | |||
* 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* 14:59 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad | |||
* 14:58 legoktm@deploy1002: legoktm: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 14:57 legoktm@deploy1002: Started scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] | |||
* 14:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* 14:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service | |||
* 14:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service | |||
* 14:35 sukhe: enable puppet on A:lvs, finished rolling out change | |||
* 14:20 sukhe: disable puppet on A:lvs to roll out CR 910566 | |||
* 14:17 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update | |||
* 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update | |||
* 13:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 10s) | |||
* 13:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1 | |||
* 13:34 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided) | |||
* 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1 | |||
* 13:26 topranks: Adding vlan config for row e/f vlans on ssw1-f1-eqiad ([[phab:T322937|T322937]]) | |||
* 13:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
* 12:19 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad | |||
* 11:27 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw | |||
* 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye | |||
* 10:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002 | |||
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" | |||
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage | |||
* 10:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" | |||
* 10:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage | |||
* 10:45 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet | |||
* 10:44 jmm@cumin2002: START - Cookbook sre.dns.netbox | |||
* 10:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet | |||
* 10:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002 | |||
* 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye | |||
* 10:07 moritzm: installing ncurses security updates | |||
* 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye | |||
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage | |||
* 09:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage | |||
* 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye | |||
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2040-2043].codfw.wmnet | |||
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002" | |||
* 09:21 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw | |||
* 09:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002" | |||
* 09:15 mvernon@cumin2002: START - Cookbook sre.dns.netbox | |||
* 09:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet | |||
* 09:02 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet | |||
* 08:59 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2040-2043].codfw.wmnet | |||
* 08:58 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet | |||
* 08:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet | |||
* 08:45 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet | |||
* 08:41 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet | |||
* 08:38 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet | |||
* 08:38 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* 08:34 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet | |||
* 08:31 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet | |||
* 08:27 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet | |||
* 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet | |||
* 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host netflow2003.codfw.wmnet with OS bookworm | |||
* 08:11 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet | |||
* 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet | |||
* 08:09 moritzm: copy samplicator from bullseye-wikimedia to bookworm-wikimedia [[phab:T330884|T330884]] | |||
* 08:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet | |||
* 07:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet | |||
* 07:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet | |||
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48397 and previous config saved to /var/cache/conftool/dbconfig/20230519-074256-root.json | |||
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48396 and previous config saved to /var/cache/conftool/dbconfig/20230519-074044-root.json | |||
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48395 and previous config saved to /var/cache/conftool/dbconfig/20230519-073959-root.json | |||
* 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage | |||
* 07:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage | |||
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48394 and previous config saved to /var/cache/conftool/dbconfig/20230519-072751-root.json | |||
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48393 and previous config saved to /var/cache/conftool/dbconfig/20230519-072539-root.json | |||
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48392 and previous config saved to /var/cache/conftool/dbconfig/20230519-072454-root.json | |||
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus4001.ulsfo.wmnet | |||
* 07:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus4001.ulsfo.wmnet | |||
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48391 and previous config saved to /var/cache/conftool/dbconfig/20230519-071247-root.json | |||
* 07:11 moritzm: installing emacs security updates | |||
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48390 and previous config saved to /var/cache/conftool/dbconfig/20230519-071034-root.json | |||
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48389 and previous config saved to /var/cache/conftool/dbconfig/20230519-070949-root.json | |||
* 06:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm | |||
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48388 and previous config saved to /var/cache/conftool/dbconfig/20230519-065742-root.json | |||
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48387 and previous config saved to /var/cache/conftool/dbconfig/20230519-065530-root.json | |||
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48386 and previous config saved to /var/cache/conftool/dbconfig/20230519-065445-root.json | |||
* 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org | |||
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48385 and previous config saved to /var/cache/conftool/dbconfig/20230519-064237-root.json | |||
* 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org | |||
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48384 and previous config saved to /var/cache/conftool/dbconfig/20230519-064025-root.json | |||
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48383 and previous config saved to /var/cache/conftool/dbconfig/20230519-063940-root.json | |||
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48382 and previous config saved to /var/cache/conftool/dbconfig/20230519-062733-root.json | |||
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48381 and previous config saved to /var/cache/conftool/dbconfig/20230519-062520-root.json | |||
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48380 and previous config saved to /var/cache/conftool/dbconfig/20230519-062435-root.json | |||
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48379 and previous config saved to /var/cache/conftool/dbconfig/20230519-061228-root.json | |||
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48378 and previous config saved to /var/cache/conftool/dbconfig/20230519-061016-root.json | |||
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48377 and previous config saved to /var/cache/conftool/dbconfig/20230519-060931-root.json | |||
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48376 and previous config saved to /var/cache/conftool/dbconfig/20230519-055723-root.json | |||
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48375 and previous config saved to /var/cache/conftool/dbconfig/20230519-055511-root.json | |||
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48374 and previous config saved to /var/cache/conftool/dbconfig/20230519-055426-root.json | |||
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2027', diff saved to https://phabricator.wikimedia.org/P48373 and previous config saved to /var/cache/conftool/dbconfig/20230519-054952-root.json | |||
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 to es3 master', diff saved to https://phabricator.wikimedia.org/P48372 and previous config saved to /var/cache/conftool/dbconfig/20230519-054923-marostegui.json | |||
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P48371 and previous config saved to /var/cache/conftool/dbconfig/20230519-054758-root.json | |||
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2033 to es2 master', diff saved to https://phabricator.wikimedia.org/P48370 and previous config saved to /var/cache/conftool/dbconfig/20230519-054737-marostegui.json | |||
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P48369 and previous config saved to /var/cache/conftool/dbconfig/20230519-054503-root.json | |||
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master', diff saved to https://phabricator.wikimedia.org/P48368 and previous config saved to /var/cache/conftool/dbconfig/20230519-054403-marostegui.json | |||
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1121 from dbctl [[phab:T336725|T336725]]', diff saved to https://phabricator.wikimedia.org/P48367 and previous config saved to /var/cache/conftool/dbconfig/20230519-053719-marostegui.json | |||
== 2023-05-18 == | |||
* 23:26 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
* 22:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | |||
* 22:21 mutante: contint2001 - moving files owned by zuul to new UID/GID - in progress | |||
* 22:20 mutante: short down-time for zuul-merger on contint2001 | |||
* 21:47 mutante: maintenance for zuul (CI) on contint servers | |||
* 21:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
* 21:13 brennen@deploy1002: Finished scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] (duration: 09m 38s) | |||
* 21:05 brennen@deploy1002: brennen: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 21:03 brennen@deploy1002: Started scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] | |||
* 21:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] (duration: 08m 09s) | |||
* 20:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 20:53 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] | |||
* 20:36 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | |||
* 20:33 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | |||
* 20:16 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] (duration: 10m 25s) | |||
* 20:07 urbanecm@deploy1002: ksarabia and urbanecm: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 20:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] | |||
* 18:57 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]] (duration: 00m 35s) | |||
* 18:56 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]] | |||
* 18:55 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | |||
* 18:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs [[phab:T330215|T330215]] | |||
* 18:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet | |||
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001" | |||
* 18:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001" | |||
* 18:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001" | |||
* 18:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001" | |||
* 18:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
* 18:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | |||
* 18:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]] | |||
* 18:07 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]] | |||
* 18:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:59 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]] | |||
* 17:38 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 17:37 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 17:36 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 17:35 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 17:27 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 17:26 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 17:26 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 17:26 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 16:55 XioNoX: push new pfw policies - [[phab:T336896|T336896]] | |||
* 16:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 16:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 16:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye | |||
* 15:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 15:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply | |||
* 15:57 inflatador: bking@cumin1001 starting rolling restart of wcqs for java updates [[phab:T334470|T334470]] | |||
* 15:53 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage | |||
* 15:50 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage | |||
* 15:47 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) (duration: 00m 10s) | |||
* 15:47 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) | |||
* 15:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet | |||
* 15:37 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye | |||
* 15:31 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet | |||
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet | |||
* 15:25 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 15:23 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet | |||
* 15:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet | |||
* 15:19 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker | |||
* 15:18 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 15:18 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 15:17 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 15:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet | |||
* 15:15 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet | |||
* 15:09 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet | |||
* 15:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet | |||
* 15:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet | |||
* 15:03 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) (duration: 00m 06s) | |||
* 15:02 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) | |||
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 14:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet | |||
* 14:34 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker | |||
* 14:31 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 14:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 14:01 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-serve-worker-codfw | |||
* 13:59 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw | |||
* 13:52 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker | |||
* 13:50 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker | |||
* 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker | |||
* 13:47 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker | |||
* 13:18 TheresNoTime: closing backport window | |||
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] (duration: 08m 45s) | |||
* 13:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 13:07 samtar@deploy1002: samtar and s-mukuti: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] | |||
* 13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:59 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 06m 19s) | |||
* 12:57 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:54 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker | |||
* 12:51 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker | |||
* 12:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:46 otto@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 07m 00s) | |||
* 12:46 elukey: clean up old jupyterhub.service references (crash looping) on stat* nodes that had it | |||
* 12:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet | |||
* 12:35 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet | |||
* 12:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet | |||
* 12:35 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 12:34 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 12:28 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet | |||
* 12:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet | |||
* 12:19 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet | |||
* 12:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet | |||
* 12:12 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm | |||
* 12:11 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet | |||
* 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet | |||
* 12:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet | |||
* 11:56 topranks: reconfiguring DHCP relay function on eqiad core routers ([[phab:T320508|T320508]]) | |||
* 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet | |||
* 11:51 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet | |||
* 11:36 kart_: MinT: Update to 2023-05-18-060931-production and Set CT2_INTRA_THREADS to 0 ([[phab:T336483|T336483]]) | |||
* 11:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 11:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:20 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 11:11 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet | |||
* 11:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet | |||
* 10:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet | |||
* 10:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet | |||
* 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet | |||
* 10:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk | |||
* 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk | |||
* 10:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet | |||
* 10:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-cache1001.eqiad.wmnet | |||
* 10:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet | |||
* 10:06 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync | |||
* 10:05 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync | |||
* 08:30 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 08:29 akosiaris: upgrade docker-registry to 2.8.2 on all registry hosts | |||
* 08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 08:26 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=registry2003.codfw.wmnet | |||
* 08:24 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync | |||
* 08:24 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync | |||
* 08:19 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync | |||
* 08:19 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync | |||
* 08:00 akosiaris: upgrade registry on registry2003 to 2.8.2 | |||
* 07:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet | |||
* 07:25 apergos: UTC morning backport and config training window done | |||
* 07:15 kartik@deploy1002: Finished scap: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 18s) | |||
* 07:07 kartik@deploy1002: kartik: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet | |||
* 07:06 kartik@deploy1002: Started scap: Backport for [[gerrit:920577{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] | |||
* 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance | |||
* 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance | |||
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1122 from dbctl [[phab:T336833|T336833]]', diff saved to https://phabricator.wikimedia.org/P48362 and previous config saved to /var/cache/conftool/dbconfig/20230518-060734-marostegui.json | |||
* 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance | |||
* 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance | |||
== 2023-05-17 == | |||
* 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001" | |||
* 22:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001" | |||
* 22:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 22:15 krinkle@deploy1002: Synchronized wmf-config/: [[phab:T332012|T332012]] (duration: 06m 51s) | |||
* 21:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet | |||
* 21:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye | |||
* 21:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye | |||
* 21:01 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Public policy" "Global Advocacy" "Zabe" --reason "per request [[:phab:T333842{{!}}T333842]]" | |||
* 20:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet | |||
* 20:32 urbanecm: UTC late B&C window done | |||
* 20:29 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] (duration: 11m 36s) | |||
* 20:19 urbanecm@deploy1002: urbanecm and matmarex and ksarabia and sgimeno: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw. | |||
* 20:17 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920784{{!}}GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134)]], [[gerrit:920732{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]], [[gerrit:920386{{!}}Enable zebra ab test in hewiki (T335972)]] | |||
* 20:15 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] (duration: 12m 06s) | |||
* 20:13 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet | |||
* 20:12 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet | |||
* 20:07 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet | |||
* 20:04 urbanecm@deploy1002: sgimeno and urbanecm: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 20:03 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920722{{!}}GrowthExperiments: enable add link frontend in 9th round wikis (T308134)]] | |||
* 19:55 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 19:54 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 19:54 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet | |||
* 19:50 ejegg: payments-wiki upgraded from {{Gerrit|8988a598}} to {{Gerrit|a7567c6a}} | |||
* 19:41 inflatador: bking@wdqs2012 depooling to attempt firmware update [[phab:T331297|T331297]] | |||
* 19:01 Amir1: Removing db1112 from zarcillo [[phab:T336332|T336332]] | |||
* 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1112.eqiad.wmnet | |||
* 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | |||
* 18:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | |||
* 18:48 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox | |||
* 18:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1112.eqiad.wmnet | |||
* 18:34 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] (duration: 06m 22s) | |||
* 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
* 18:11 otto@deploy1002: Finished deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795] (duration: 09m 14s) | |||
* 18:03 brennen: train 1.41.0-wmf.9 ([[phab:T330215|T330215]]): no current blockers, rolling to group1 as backup-backup conductor | |||
* 18:02 otto@deploy1002: Started deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795] | |||
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync | |||
* 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync | |||
* 17:19 brett: Maglev LVS scheduler rollout finished in esams - [[phab:T263797|T263797]] | |||
* 16:58 Guest4300: Running `foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --video --mime=video/mpeg --missing --error --stalled --throttle` on mwmaint1002 for [[phab:T244570|T244570]] | |||
* 16:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox | |||
* 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48356 and previous config saved to /var/cache/conftool/dbconfig/20230517-162444-ladsgroup.json | |||
* 16:21 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox | |||
* 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48355 and previous config saved to /var/cache/conftool/dbconfig/20230517-161929-ladsgroup.json | |||
* 16:18 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 16:17 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 16:14 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 16:13 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48354 and previous config saved to /var/cache/conftool/dbconfig/20230517-160937-ladsgroup.json | |||
* 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48353 and previous config saved to /var/cache/conftool/dbconfig/20230517-160423-ladsgroup.json | |||
* 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 15:57 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply | |||
* 15:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply | |||
* 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48352 and previous config saved to /var/cache/conftool/dbconfig/20230517-155431-ladsgroup.json | |||
* 15:52 brett: Rolling out maglev LVS scheduler in esams - [[phab:T263797|T263797]] | |||
* 15:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply | |||
* 15:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply | |||
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48351 and previous config saved to /var/cache/conftool/dbconfig/20230517-154916-ladsgroup.json | |||
* 15:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48350 and previous config saved to /var/cache/conftool/dbconfig/20230517-153925-ladsgroup.json | |||
* 15:38 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48349 and previous config saved to /var/cache/conftool/dbconfig/20230517-153410-ladsgroup.json | |||
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48348 and previous config saved to /var/cache/conftool/dbconfig/20230517-153042-ladsgroup.json | |||
* 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance | |||
* 15:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance | |||
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2032 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48347 and previous config saved to /var/cache/conftool/dbconfig/20230517-153010-ladsgroup.json | |||
* 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48346 and previous config saved to /var/cache/conftool/dbconfig/20230517-153004-ladsgroup.json | |||
* 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance | |||
* 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance | |||
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48345 and previous config saved to /var/cache/conftool/dbconfig/20230517-152945-ladsgroup.json | |||
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org | |||
* 15:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org | |||
* 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org | |||
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48344 and previous config saved to /var/cache/conftool/dbconfig/20230517-151458-ladsgroup.json | |||
* 15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org | |||
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48343 and previous config saved to /var/cache/conftool/dbconfig/20230517-151438-ladsgroup.json | |||
* 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet | |||
* 15:07 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . | |||
* 15:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet | |||
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48342 and previous config saved to /var/cache/conftool/dbconfig/20230517-145952-ladsgroup.json | |||
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48341 and previous config saved to /var/cache/conftool/dbconfig/20230517-145932-ladsgroup.json | |||
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs101[6-9]*<nowiki>}</nowiki> and A:aqs | |||
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48340 and previous config saved to /var/cache/conftool/dbconfig/20230517-144446-ladsgroup.json | |||
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48339 and previous config saved to /var/cache/conftool/dbconfig/20230517-144425-ladsgroup.json | |||
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48338 and previous config saved to /var/cache/conftool/dbconfig/20230517-144025-ladsgroup.json | |||
* 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance | |||
* 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance | |||
* 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1027 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48337 and previous config saved to /var/cache/conftool/dbconfig/20230517-143949-ladsgroup.json | |||
* 14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance | |||
* 14:39 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - EventBus: produce to mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] (duration: 06m 20s) | |||
* 14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance | |||
* 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker | |||
* 14:36 moritzm: installing jackson-databind security updates | |||
* 14:34 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for [[phab:T336800|T336800]] (duration: 00m 09s) | |||
* 14:34 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for [[phab:T336800|T336800]] | |||
* 14:33 ottomata: EventBus: produce to mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] | |||
* 14:30 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync | |||
* 14:30 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync | |||
* 14:28 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync | |||
* 14:28 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync | |||
* 14:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync | |||
* 14:27 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync | |||
* 14:27 ottomata: rolling restart of eventgate-main to pick up new mediawiki.page_change.v1 stream config - [[phab:T336817|T336817]] | |||
* 14:17 elukey: run authdns-update for new ml-serve/ores discovery endpoints - [[phab:T336726|T336726]] | |||
* 14:15 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs101[6-9]*<nowiki>}</nowiki> and A:aqs | |||
* 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs101[2-5]*<nowiki>}</nowiki> and A:aqs | |||
* 14:14 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - Declare mediawiki.page_change.v1 stream - [[phab:T336817|T336817]] (duration: 07m 30s) | |||
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:09 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:09 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:08 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet | |||
* 13:59 taavi@deploy1002: Finished scap: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] (duration: 07m 24s) | |||
* 13:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet | |||
* 13:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet | |||
* 13:54 taavi@deploy1002: matmarex and taavi: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:52 taavi@deploy1002: Started scap: Backport for [[gerrit:920582{{!}}Define $maintClass in maintenance script for compatibility (T317375)]] | |||
* 13:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet | |||
* 13:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet | |||
* 13:47 taavi@deploy1002: Finished scap: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] (duration: 08m 11s) | |||
* 13:42 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs101[2-5]*<nowiki>}</nowiki> and A:aqs | |||
* 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs102[0-1]*<nowiki>}</nowiki> and A:aqs | |||
* 13:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet | |||
* 13:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet | |||
* 13:40 taavi@deploy1002: taavi and maurelio: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 13:38 taavi@deploy1002: Started scap: Backport for [[gerrit:920244{{!}}dblists: Close akwiki (T336675)]] | |||
* 13:38 taavi@deploy1002: Finished scap: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] (duration: 07m 39s) | |||
* 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet | |||
* 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet | |||
* 13:32 taavi@deploy1002: stang and taavi: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 13:30 taavi@deploy1002: Started scap: Backport for [[gerrit:920396{{!}}plwiki: Show language selector in main page header (T336707)]] | |||
* 13:29 taavi@deploy1002: Finished scap: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] (duration: 09m 15s) | |||
* 13:25 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs102[0-1]*<nowiki>}</nowiki> and A:aqs | |||
* 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet | |||
* 13:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet | |||
* 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P<nowiki>{</nowiki>aqs1011*<nowiki>}</nowiki> and A:aqs | |||
* 13:24 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 13:23 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 13:23 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 13:22 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 13:22 taavi@deploy1002: gtzatchkova and taavi: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:22 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker | |||
* 13:20 taavi@deploy1002: Started scap: Backport for [[gerrit:920296{{!}}Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760)]], [[gerrit:920306{{!}}Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)]] | |||
* 13:20 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. | |||
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. | |||
* 13:18 daniel@deploy1002: Finished scap: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] (duration: 11m 52s) | |||
* 13:17 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P<nowiki>{</nowiki>aqs1011*<nowiki>}</nowiki> and A:aqs | |||
* 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet | |||
* 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-canary | |||
* 13:07 daniel@deploy1002: daniel: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:06 daniel@deploy1002: Started scap: Backport for [[gerrit:920230{{!}}Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347)]], [[gerrit:920231{{!}}Use MultiHttpClient instead of VirtualRESTService. (T335347)]] | |||
* 13:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet | |||
* 13:00 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-canary | |||
* 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48335 and previous config saved to /var/cache/conftool/dbconfig/20230517-125952-ladsgroup.json | |||
* 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48334 and previous config saved to /var/cache/conftool/dbconfig/20230517-125824-ladsgroup.json | |||
* 12:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet | |||
* 12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet | |||
* 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001" | |||
* 12:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001" | |||
* 12:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox | |||
* 12:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet | |||
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48333 and previous config saved to /var/cache/conftool/dbconfig/20230517-124446-ladsgroup.json | |||
* 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48332 and previous config saved to /var/cache/conftool/dbconfig/20230517-124318-ladsgroup.json | |||
* 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48331 and previous config saved to /var/cache/conftool/dbconfig/20230517-122940-ladsgroup.json | |||
* 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48330 and previous config saved to /var/cache/conftool/dbconfig/20230517-122812-ladsgroup.json | |||
* 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48329 and previous config saved to /var/cache/conftool/dbconfig/20230517-121434-ladsgroup.json | |||
* 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48328 and previous config saved to /var/cache/conftool/dbconfig/20230517-121306-ladsgroup.json | |||
* 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox | |||
* 12:11 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox | |||
* 12:06 topranks: Merging CR822439 and beginning bulk puppetdb -> netbox import to update host interfaces | |||
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48327 and previous config saved to /var/cache/conftool/dbconfig/20230517-115943-ladsgroup.json | |||
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance | |||
* 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance | |||
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48326 and previous config saved to /var/cache/conftool/dbconfig/20230517-115908-ladsgroup.json | |||
* 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48325 and previous config saved to /var/cache/conftool/dbconfig/20230517-115612-ladsgroup.json | |||
* 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance | |||
* 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance | |||
* 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48324 and previous config saved to /var/cache/conftool/dbconfig/20230517-115538-ladsgroup.json | |||
* 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48323 and previous config saved to /var/cache/conftool/dbconfig/20230517-115303-ladsgroup.json | |||
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48322 and previous config saved to /var/cache/conftool/dbconfig/20230517-114402-ladsgroup.json | |||
* 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48321 and previous config saved to /var/cache/conftool/dbconfig/20230517-114032-ladsgroup.json | |||
* 11:38 kart_: Update MinT to 2023-05-17-052844-production: Set CT2_USE_EXPERIMENTAL_PACKED_GEMM for better performance | |||
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48320 and previous config saved to /var/cache/conftool/dbconfig/20230517-113757-ladsgroup.json | |||
* 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48319 and previous config saved to /var/cache/conftool/dbconfig/20230517-113531-ladsgroup.json | |||
* 11:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48318 and previous config saved to /var/cache/conftool/dbconfig/20230517-112856-ladsgroup.json | |||
* 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 11:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48317 and previous config saved to /var/cache/conftool/dbconfig/20230517-112526-ladsgroup.json | |||
* 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48316 and previous config saved to /var/cache/conftool/dbconfig/20230517-112251-ladsgroup.json | |||
* 11:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48315 and previous config saved to /var/cache/conftool/dbconfig/20230517-112024-ladsgroup.json | |||
* 11:15 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48314 and previous config saved to /var/cache/conftool/dbconfig/20230517-111350-ladsgroup.json | |||
* 11:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48313 and previous config saved to /var/cache/conftool/dbconfig/20230517-111020-ladsgroup.json | |||
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48312 and previous config saved to /var/cache/conftool/dbconfig/20230517-110745-ladsgroup.json | |||
* 11:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply | |||
* 11:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply | |||
* 11:05 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply | |||
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48311 and previous config saved to /var/cache/conftool/dbconfig/20230517-110518-ladsgroup.json | |||
* 11:05 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply | |||
* 11:04 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply | |||
* 11:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply | |||
* 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2034 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48310 and previous config saved to /var/cache/conftool/dbconfig/20230517-110251-ladsgroup.json | |||
* 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance | |||
* 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance | |||
* 11:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply | |||
* 11:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply | |||
* 11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply | |||
* 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48309 and previous config saved to /var/cache/conftool/dbconfig/20230517-110130-ladsgroup.json | |||
* 11:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance | |||
* 11:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply | |||
* 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance | |||
* 11:00 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply | |||
* 11:00 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply | |||
* 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48308 and previous config saved to /var/cache/conftool/dbconfig/20230517-105957-ladsgroup.json | |||
* 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance | |||
* 10:59 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply | |||
* 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance | |||
* 10:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply | |||
* 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply | |||
* 10:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply | |||
* 10:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply | |||
* 10:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply | |||
* 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48307 and previous config saved to /var/cache/conftool/dbconfig/20230517-105012-ladsgroup.json | |||
* 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2033 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48306 and previous config saved to /var/cache/conftool/dbconfig/20230517-104519-ladsgroup.json | |||
* 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance | |||
* 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance | |||
* 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48305 and previous config saved to /var/cache/conftool/dbconfig/20230517-104454-ladsgroup.json | |||
* 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P48304 and previous config saved to /var/cache/conftool/dbconfig/20230517-103815-ladsgroup.json | |||
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48303 and previous config saved to /var/cache/conftool/dbconfig/20230517-103129-root.json | |||
* 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48302 and previous config saved to /var/cache/conftool/dbconfig/20230517-102948-ladsgroup.json | |||
* 10:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply | |||
* 10:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply | |||
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P48301 and previous config saved to /var/cache/conftool/dbconfig/20230517-102310-ladsgroup.json | |||
* 10:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply | |||
* 10:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply | |||
* 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply | |||
* 10:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply | |||
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48300 and previous config saved to /var/cache/conftool/dbconfig/20230517-101624-root.json | |||
* 10:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48299 and previous config saved to /var/cache/conftool/dbconfig/20230517-101442-ladsgroup.json | |||
* 10:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:08 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P48298 and previous config saved to /var/cache/conftool/dbconfig/20230517-100805-ladsgroup.json | |||
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48297 and previous config saved to /var/cache/conftool/dbconfig/20230517-100120-root.json | |||
* 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48296 and previous config saved to /var/cache/conftool/dbconfig/20230517-095936-ladsgroup.json | |||
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2026 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48295 and previous config saved to /var/cache/conftool/dbconfig/20230517-095443-ladsgroup.json | |||
* 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance | |||
* 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance | |||
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P48294 and previous config saved to /var/cache/conftool/dbconfig/20230517-095301-ladsgroup.json | |||
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48293 and previous config saved to /var/cache/conftool/dbconfig/20230517-094615-root.json | |||
* 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2029 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48292 and previous config saved to /var/cache/conftool/dbconfig/20230517-093928-ladsgroup.json | |||
* 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance | |||
* 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance | |||
* 09:39 elukey: roll restart pybal on lvs2010, lvs2009, lvs1020, lvs1019 to pick up a VIP (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/920219) - [[phab:T336726|T336726]] | |||
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48291 and previous config saved to /var/cache/conftool/dbconfig/20230517-093110-root.json | |||
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48290 and previous config saved to /var/cache/conftool/dbconfig/20230517-091606-root.json | |||
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1220 cleaning gtid_domain_id', diff saved to https://phabricator.wikimedia.org/P48289 and previous config saved to /var/cache/conftool/dbconfig/20230517-091407-root.json | |||
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48288 and previous config saved to /var/cache/conftool/dbconfig/20230517-085855-root.json | |||
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48287 and previous config saved to /var/cache/conftool/dbconfig/20230517-084350-root.json | |||
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48285 and previous config saved to /var/cache/conftool/dbconfig/20230517-082846-root.json | |||
* 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet | |||
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet | |||
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48284 and previous config saved to /var/cache/conftool/dbconfig/20230517-081341-root.json | |||
* 08:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 08:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 08:05 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 08:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48283 and previous config saved to /var/cache/conftool/dbconfig/20230517-075836-root.json | |||
* 07:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 07:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply | |||
* 07:48 moritzm: upgrading krb1001 to Bullseye [[phab:T331695|T331695]] | |||
* 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye | |||
* 07:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye | |||
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48278 and previous config saved to /var/cache/conftool/dbconfig/20230517-074332-root.json | |||
* 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 37468 | |||
* 07:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 37468 | |||
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 4%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48277 and previous config saved to /var/cache/conftool/dbconfig/20230517-072827-root.json | |||
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for decommissioning', diff saved to https://phabricator.wikimedia.org/P48276 and previous config saved to /var/cache/conftool/dbconfig/20230517-072508-root.json | |||
* 07:19 kartik@deploy1002: Finished scap: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] (duration: 07m 22s) | |||
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48275 and previous config saved to /var/cache/conftool/dbconfig/20230517-071428-root.json | |||
* 07:13 kartik@deploy1002: trainbranchbot and kartik: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48274 and previous config saved to /var/cache/conftool/dbconfig/20230517-071322-root.json | |||
* 07:11 kartik@deploy1002: Started scap: Backport for [[gerrit:920625{{!}}Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"]] | |||
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T336725|T336725]]', diff saved to https://phabricator.wikimedia.org/P48273 and previous config saved to /var/cache/conftool/dbconfig/20230517-071039-root.json | |||
* 07:09 kartik@deploy1002: Backport cancelled. | |||
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48272 and previous config saved to /var/cache/conftool/dbconfig/20230517-065923-root.json | |||
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48271 and previous config saved to /var/cache/conftool/dbconfig/20230517-065817-root.json | |||
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48270 and previous config saved to /var/cache/conftool/dbconfig/20230517-064419-root.json | |||
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48269 and previous config saved to /var/cache/conftool/dbconfig/20230517-064313-root.json | |||
* 06:40 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply | |||
* 06:39 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply | |||
* 06:39 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply | |||
* 06:38 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply | |||
* 06:37 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply | |||
* 06:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply | |||
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48268 and previous config saved to /var/cache/conftool/dbconfig/20230517-062914-root.json | |||
* 06:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply | |||
* 06:21 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply | |||
* 06:20 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply | |||
* 06:20 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply | |||
* 06:19 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply | |||
* 06:18 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply | |||
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48267 and previous config saved to /var/cache/conftool/dbconfig/20230517-061409-root.json | |||
* 06:01 volans: restarted ferm on ms-be1047 | |||
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48265 and previous config saved to /var/cache/conftool/dbconfig/20230517-055904-root.json | |||
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096', diff saved to https://phabricator.wikimedia.org/P48264 and previous config saved to /var/cache/conftool/dbconfig/20230517-055310-root.json | |||
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1115.eqiad.wmnet | |||
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 05:48 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" | |||
* 05:46 marostegui@cumin1001: START - Cookbook sre.dns.netbox | |||
* 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1115.eqiad.wmnet | |||
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1112 from dbctl [[phab:T336332|T336332]]', diff saved to https://phabricator.wikimedia.org/P48263 and previous config saved to /var/cache/conftool/dbconfig/20230517-052007-marostegui.json | |||
* 05:16 marostegui: Optimize s7 on dbstore1003 [[phab:T336733|T336733]] | |||
* 00:21 krinkle@deploy1002: Synchronized src/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 01s) | |||
* 00:15 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 14s) | |||
* 00:07 krinkle@deploy1002: Synchronized lib/: {{Gerrit|I4cfa4a2474b4e}} (duration: 06m 51s) | |||
== 2023-05-16 == | |||
* 20:59 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] (duration: 07m 18s) | |||
* 20:53 jdrewniak@deploy1002: jdrewniak and matmarex: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 20:52 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920237{{!}}Add maint script to opt out active users from the new topic tool (T317375)]] | |||
* 20:49 volans@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 20:49 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] (duration: 09m 19s) | |||
* 20:41 jdrewniak@deploy1002: jdrewniak: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 20:39 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920242{{!}}Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)]] | |||
* 20:36 jdrewniak@deploy1002: Finished scap: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] (duration: 07m 44s) | |||
* 20:30 jdrewniak@deploy1002: jdrewniak: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 20:30 brett: Rolling out maglev LVS scheduler in drmrs (for real this time) - [[phab:T263797|T263797]] | |||
* 20:29 jdrewniak@deploy1002: Started scap: Backport for [[gerrit:920240{{!}}Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)]] | |||
* 19:13 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 19:13 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 19:12 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 19:10 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 19:10 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 19:04 sukhe: dummry run of authdns-update to confirm new hosts | |||
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2003.wikimedia.org | |||
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 18:59 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 18:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox | |||
* 18:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw | |||
* 18:54 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw | |||
* 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2003.wikimedia.org | |||
* 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.* | |||
* 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.* | |||
* 18:50 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 18:50 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 18:50 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 18:49 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 18:47 ryankemper: [WDQS] Pooled `wdqs2012` | |||
* 18:46 ryankemper: [WDQS] Pooled `wdqs2006` (not sure why was depooled) | |||
* 18:46 sukhe: homer "cr*-codfw*" commit "Gerrit: 920363 remove to-be decommissioned host dns2003": [[phab:T335777|T335777]] | |||
* 18:46 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 18:43 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 18:43 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 18:42 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002" | |||
* 18:41 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 18:41 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 18:36 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.74 208.80.153.107 ]: [[phab:T326688|T326688]] | |||
* 18:34 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
* 18:28 sukhe: homer "cr*-codfw*" commit "Gerrit: 920358 add new DNS host dns2006": [[phab:T326688|T326688]] | |||
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bullseye | |||
* 18:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage | |||
* 18:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage | |||
* 18:01 sukhe: enable puppet on A:cp-text | |||
* 17:58 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply | |||
* 17:57 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply | |||
* 17:56 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply | |||
* 17:55 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply | |||
* 17:52 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply | |||
* 17:52 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply | |||
* 17:47 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001" | |||
* 17:46 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001" | |||
* 17:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye | |||
* 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:40 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:40 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001" | |||
* 17:40 moritzm: installing avahi security updates on buster | |||
* 17:39 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001" | |||
* 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:37 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet | |||
* 17:34 joal@deploy1002: Finished deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937] (duration: 00m 10s) | |||
* 17:34 joal@deploy1002: Started deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937] | |||
* 17:27 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:27 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:27 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001" | |||
* 17:27 brett: Rolling out maglev LVS scheduler in drmrs - [[phab:T263797|T263797]] | |||
* 17:26 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001" | |||
* 17:24 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001" | |||
* 17:19 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001" | |||
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2002.wikimedia.org | |||
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 17:17 volans@cumin1001: START - Cookbook sre.dns.netbox | |||
* 17:17 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 17:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:09 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2002.wikimedia.org | |||
* 17:00 sukhe: homer "cr*-codfw*" commit "Gerrit: 920320 remove to-be decommissioned host dns2002" [[phab:T335777|T335777]] | |||
* 16:59 moritzm: installing 5.10.179 kernels on Bullseye hosts | |||
* 16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet | |||
* 16:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . | |||
* 16:30 volans: restarting wikibugs ( https://www.mediawiki.org/wiki/Wikibugs#Help ) | |||
* 16:06 mutante: gitlab-runner2003 - installed rsync client for debugging an issue with rsync from inside containers, comparing to from outside container | |||
* 15:49 sukhe: run authdns-update for CR 920314 | |||
* 15:41 joal@deploy1002: Finished deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd] (duration: 00m 10s) | |||
* 15:41 joal@deploy1002: Started deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd] | |||
* 15:36 hashar: Some CI jobs started failing after an upgrade of some Jenkins plugins. I have upgraded a couple more and it seems to work now [[phab:T336775|T336775]] | |||
* 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]: [[phab:T326688|T326688]] | |||
* 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ] | |||
* 15:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply | |||
* 15:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply | |||
* 15:27 hashar: Restarting CI Jenkins | |||
* 15:26 Emperor: rebalance codfw swift rings [[phab:T335280|T335280]] | |||
* 15:18 hashar: CI Jenkins jobs are stall following the plugins upgrade :/ | |||
* 15:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 15:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 15:03 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 14:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 14:55 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:49 moritzm: installing libxml2 security updates on buster | |||
* 14:48 sukhe: [done] "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": [[phab:T326688|T326688]] | |||
* 14:47 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 14:43 hashar: Restarting CI Jenkins | |||
* 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 14:42 sukhe: "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": [[phab:T326688|T326688]] | |||
* 14:36 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 14:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 14:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 14:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 14:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync | |||
* 14:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync | |||
* 14:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 14:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 14:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 14:26 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 14:26 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 14:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bullseye | |||
* 14:18 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 00m 45s) | |||
* 14:17 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) | |||
* 14:10 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in codfw: codfw row D switches upgrade done - [[phab:T335042|T335042]] | |||
* 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage | |||
* 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage | |||
* 13:54 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: codfw row D switches upgrade done - [[phab:T335042|T335042]] | |||
* 13:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye | |||
* 13:49 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-eqiad | |||
* 13:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 13:46 Emperor: repool ms-fe2012 [[phab:T335042|T335042]] | |||
* 13:45 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-eqiad | |||
* 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.codfw.wmnet | |||
* 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.eqiad.wmnet | |||
* 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet,service=thanos-web | |||
* 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfwm.wmnet,service=thanos-web | |||
* 13:32 taavi@deploy1002: Finished scap: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] (duration: 09m 08s) | |||
* 13:32 Emperor: repool thanos-fe2003 [[phab:T335042|T335042]] | |||
* 13:30 sukhe: running authdns-update to repool codfw | |||
* 13:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org | |||
* 13:25 taavi@deploy1002: mazevedo and taavi: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 13:25 moritzm: enabled Puppet in codfw/esams/ulsfo for switch maintenance [[phab:T335042|T335042]] | |||
* 13:23 taavi@deploy1002: Started scap: Backport for [[gerrit:919372{{!}}Add stream config for mobile apps schema (T336508)]] | |||
* 13:01 XioNoX: asw-d-codfw> request system reboot all-members - [[phab:T335042|T335042]] | |||
* 12:52 Emperor: depool ms-fe2012 [[phab:T335042|T335042]] | |||
* 12:51 Emperor: depool thanos-fe2003 [[phab:T335042|T335042]] | |||
* 12:50 moritzm: disabling Puppet in codfw/esams/ulsfo for switch maintenance [[phab:T335042|T335042]] | |||
* 12:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 189 hosts with reason: codfw row D upgrade | |||
* 12:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 189 hosts with reason: codfw row D upgrade | |||
* 12:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet | |||
* 12:39 akosiaris: reboot rdb1009 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however | |||
* 12:39 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet | |||
* 12:35 godog: start cadvisor 0.44 upgrade to buster hosts - [[phab:T336740|T336740]] | |||
* 12:29 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2] (duration: 01m 30s) | |||
* 12:28 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2] | |||
* 12:27 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 04s) | |||
* 12:27 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] | |||
* 12:24 sukhe: [done] running authdns-update to disable codfw for switch upgrade: [[phab:T335042|T335042]] | |||
* 12:22 sukhe: running authdns-update to disable codfw for switch upgrade: [[phab:T335042|T335042]] | |||
* 12:21 XioNoX: disable ping offload in codfw - [[phab:T335042|T335042]] | |||
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply | |||
* 12:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 12:15 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 10s) | |||
* 12:15 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] | |||
* 12:09 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 12:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 12:04 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 12:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 11:59 kart_: Updated cxserver to 2023-05-16-061239-production ([[phab:T336657|T336657]]) | |||
* 11:57 XioNoX: stage upgrade on asw-d-codfw - [[phab:T335042|T335042]] | |||
* 11:56 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2] (duration: 10m 45s) | |||
* 11:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply | |||
* 11:55 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply | |||
* 11:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 11:52 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 11:51 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-codfw | |||
* 11:50 marostegui: install 10.4.29 on db1151 [[phab:T336462|T336462]] | |||
* 11:50 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply | |||
* 11:49 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply | |||
* 11:47 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-codfw | |||
* 11:46 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 11:46 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 11:45 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2] | |||
* 11:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply | |||
* 11:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply | |||
* 11:30 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2002.codfw.wmnet with OS bookworm | |||
* 11:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet | |||
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 14 hosts with reason: maintenance | |||
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 14 hosts with reason: maintenance | |||
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 11 hosts with reason: maintenance | |||
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 11 hosts with reason: maintenance | |||
* 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: maintenance | |||
* 11:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 13 hosts with reason: maintenance | |||
* 11:20 akosiaris: reboot rdb2007 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however | |||
* 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bookworm | |||
* 11:17 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2004.codfw.wmnet with OS bookworm | |||
* 11:16 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet | |||
* 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bookworm | |||
* 11:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet | |||
* 11:00 moritzm: updated bookworm image to RC3 [[phab:T330495|T330495]] | |||
* 10:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet | |||
* 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet | |||
* 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet | |||
* 10:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 10:52 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet | |||
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet | |||
* 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet | |||
* 10:50 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet | |||
* 10:50 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 10:49 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None | |||
* 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None | |||
* 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None | |||
* 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None | |||
* 10:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) depool all active/active services in codfw: codfw row D switches upgrade - [[phab:T335042|T335042]] | |||
* 10:43 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab-runner1003.eqiad.wmnet | |||
* 10:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 10:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 10:39 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 10:38 jayme@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:35 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade | |||
* 10:34 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade | |||
* 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001" | |||
* 10:33 vgutierrez: testing HAProxy 2.7.8 in cp4052 and cp5032 (upload) - [[phab:T317799|T317799]] | |||
* 10:33 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001" | |||
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:29 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: codfw row D switches upgrade - [[phab:T335042|T335042]] | |||
* 10:28 elukey@cumin1001: START - Cookbook sre.dns.netbox | |||
* 10:13 Amir1: cleaning up echo notification table in all wikis ([[phab:T318523|T318523]]) | |||
* 10:07 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . | |||
* 10:06 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . | |||
* 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 09:49 btullis@deploy1002: Finished deploy [airflow-dags/analytics_product@7642b62]: (no justification provided) (duration: 00m 09s) | |||
* 09:49 btullis@deploy1002: Started deploy [airflow-dags/analytics_product@7642b62]: (no justification provided) | |||
* 09:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet | |||
* 09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet | |||
* 09:25 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet | |||
* 09:23 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.reboot-runner (exit_code=1) rolling reboot on A:gitlab-runner | |||
* 09:23 jnuche@deploy1002: Installing scap version "4.52.2" for 595 hosts | |||
* 09:21 marostegui: Optimize s5 on dbstore1003 [[phab:T336733|T336733]] | |||
* 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance | |||
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance | |||
* 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance | |||
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance | |||
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance | |||
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance | |||
* 08:18 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2006.wikimedia.org | |||
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance | |||
* 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance | |||
* 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance | |||
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance | |||
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance | |||
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance | |||
* 07:52 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner | |||
* 07:28 Emperor: restart vopsbot.service on alert1001 | |||
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48254 and previous config saved to /var/cache/conftool/dbconfig/20230516-071509-root.json | |||
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48253 and previous config saved to /var/cache/conftool/dbconfig/20230516-071453-root.json | |||
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48252 and previous config saved to /var/cache/conftool/dbconfig/20230516-070005-root.json | |||
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48251 and previous config saved to /var/cache/conftool/dbconfig/20230516-065948-root.json | |||
* 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . | |||
* 06:56 marostegui@deploy1002: Finished scap: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] (duration: 06m 58s) | |||
* 06:51 marostegui@deploy1002: marostegui: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 06:50 eileen: civicrm: revision {{Gerrit|d97a371e}}, config {{Gerrit|686d3cb4}} | |||
* 06:49 marostegui@deploy1002: Started scap: Backport for [[gerrit:919324{{!}}Revert "ProductionServices.php: Promote pc1014 to pc3 master"]] | |||
* 06:49 _joe_: running docker image prune -a in build2001 | |||
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48250 and previous config saved to /var/cache/conftool/dbconfig/20230516-064500-root.json | |||
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48249 and previous config saved to /var/cache/conftool/dbconfig/20230516-064444-root.json | |||
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48248 and previous config saved to /var/cache/conftool/dbconfig/20230516-062955-root.json | |||
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48247 and previous config saved to /var/cache/conftool/dbconfig/20230516-062939-root.json | |||
* 06:24 marostegui@deploy1002: Finished scap: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] (duration: 07m 08s) | |||
* 06:24 eileen: civicrm upgraded from {{Gerrit|ef7b3822}} to {{Gerrit|d97a371e}} | |||
* 06:18 marostegui@deploy1002: marostegui: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 06:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:920147{{!}}ProductionServices.php: Promote pc1014 to pc3 master]] | |||
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48246 and previous config saved to /var/cache/conftool/dbconfig/20230516-061450-root.json | |||
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48245 and previous config saved to /var/cache/conftool/dbconfig/20230516-061434-root.json | |||
* 06:05 marostegui@deploy1002: Finished scap: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] (duration: 07m 21s) | |||
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48244 and previous config saved to /var/cache/conftool/dbconfig/20230516-055946-root.json | |||
* 05:59 marostegui@deploy1002: marostegui: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48243 and previous config saved to /var/cache/conftool/dbconfig/20230516-055929-root.json | |||
* 05:58 marostegui@deploy1002: Started scap: Backport for [[gerrit:919323{{!}}Revert "ProductionServices.php: Failover pc3 codfw host"]] | |||
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 [[phab:T336332|T336332]]', diff saved to https://phabricator.wikimedia.org/P48242 and previous config saved to /var/cache/conftool/dbconfig/20230516-055122-root.json | |||
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48241 and previous config saved to /var/cache/conftool/dbconfig/20230516-054441-root.json | |||
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48240 and previous config saved to /var/cache/conftool/dbconfig/20230516-054425-root.json | |||
* 05:43 marostegui@deploy1002: Finished scap: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] (duration: 07m 15s) | |||
* 05:38 marostegui@deploy1002: marostegui: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 05:36 marostegui@deploy1002: Started scap: Backport for [[gerrit:920139{{!}}ProductionServices.php: Failover pc3 codfw host]] | |||
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48239 and previous config saved to /var/cache/conftool/dbconfig/20230516-052936-root.json | |||
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48238 and previous config saved to /var/cache/conftool/dbconfig/20230516-052920-root.json | |||
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1221 [[phab:T336337|T336337]]', diff saved to https://phabricator.wikimedia.org/P48237 and previous config saved to /var/cache/conftool/dbconfig/20230516-052026-root.json | |||
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T336337|T336337]]', diff saved to https://phabricator.wikimedia.org/P48236 and previous config saved to /var/cache/conftool/dbconfig/20230516-052014-root.json | |||
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.6, 1.41.0-wmf.7 (duration: 02m 26s) | |||
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] (duration: 48m 47s) | |||
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.9 refs [[phab:T330215|T330215]] | |||
== 2023-05-15 == | |||
* 23:37 eileen: civicrm upgraded from {{Gerrit|db6e8d69}} to {{Gerrit|ef7b3822}} | |||
* 22:02 maryum: deployed patch for [[phab:T323651|T323651]] | |||
* 21:51 maryum: Deployed patch for [[phab:T335612|T335612]] | |||
* 21:42 ejegg: payments-wiki upgraded from {{Gerrit|c0da741f}} to {{Gerrit|8988a598}} (and globalcollect settings deleted) | |||
* 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet | |||
* 19:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet | |||
* 19:50 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]] | |||
* 19:50 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]] | |||
* 19:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet | |||
* 19:49 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]] | |||
* 19:49 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - [[phab:T335042|T335042]] | |||
* 19:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet | |||
* 19:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 2:00:00 on 20 hosts with reason: [[phab:T335042|T335042]] maintenance | |||
* 19:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 2:00:00 on 20 hosts with reason: [[phab:T335042|T335042]] maintenance | |||
* 19:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet | |||
* 19:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet | |||
* 19:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet | |||
* 19:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet | |||
* 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS (duration: 02m 03s) | |||
* 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS | |||
* 19:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet | |||
* 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet | |||
* 19:19 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s) | |||
* 19:19 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided) | |||
* 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s) | |||
* 19:18 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided) | |||
* 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 05m 46s) | |||
* 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet | |||
* 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet | |||
* 19:12 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided) | |||
* 19:12 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: 0.3.124 (duration: 10m 05s) | |||
* 19:03 inflatador: [WDQS Deploy] Tests passing following deploy of `0.3.124` on canary `wdqs1003`; proceeding to rest of fleet | |||
* 19:02 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: 0.3.124 | |||
* 18:54 mutante: LDAP - added uid 'adee' to groups wmde and nda - [[phab:T336434|T336434]] | |||
* 18:54 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.10 ]: codfw row D maint 2023/05/16 [dns2002] [[phab:T335042|T335042]] | |||
* 18:33 brett: Rolling out maglev LVS scheduler in eqsin - [[phab:T263797|T263797]] | |||
* 18:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye | |||
* 18:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye | |||
* 18:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye | |||
* 18:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye | |||
* 17:47 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:47 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:47 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:46 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:42 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:42 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:41 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:39 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:39 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 17:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:30 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:29 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:27 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:27 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 17:27 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:26 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002" | |||
* 17:15 volans@cumin2002: START - Cookbook sre.dns.netbox | |||
* 17:15 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet | |||
* 15:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete | |||
* 15:00 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete | |||
* 14:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook | |||
* 14:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook | |||
* 14:21 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye | |||
* 14:20 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. | |||
* 14:20 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. | |||
* 14:17 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. | |||
* 14:16 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. | |||
* 14:03 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage | |||
* 14:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage | |||
* 13:56 volans: re-enabled puppet on the install hosts to deploy changes for [[phab:T336485|T336485]] | |||
* 13:45 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye | |||
* 13:33 volans: disabling puppet on the install hosts to deploy changes for [[phab:T336485|T336485]] | |||
* 13:00 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 13:00 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 12:58 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 12:58 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48228 and previous config saved to /var/cache/conftool/dbconfig/20230515-111624-ladsgroup.json | |||
* 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48227 and previous config saved to /var/cache/conftool/dbconfig/20230515-110118-ladsgroup.json | |||
* 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48226 and previous config saved to /var/cache/conftool/dbconfig/20230515-104611-ladsgroup.json | |||
* 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48225 and previous config saved to /var/cache/conftool/dbconfig/20230515-103105-ladsgroup.json | |||
* 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1023 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48224 and previous config saved to /var/cache/conftool/dbconfig/20230515-102038-ladsgroup.json | |||
* 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance | |||
* 10:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance | |||
* 10:19 Amir1: Removing db1123 from zarcillo [[phab:T334910|T334910]] | |||
* 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1123.eqiad.wmnet | |||
* 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | |||
* 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48223 and previous config saved to /var/cache/conftool/dbconfig/20230515-101329-ladsgroup.json | |||
* 10:13 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | |||
* 10:11 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox | |||
* 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1123.eqiad.wmnet | |||
* 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48222 and previous config saved to /var/cache/conftool/dbconfig/20230515-095823-ladsgroup.json | |||
* 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1123 from dbctl [[phab:T334910|T334910]]', diff saved to https://phabricator.wikimedia.org/P48221 and previous config saved to /var/cache/conftool/dbconfig/20230515-095412-ladsgroup.json | |||
* 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1123 [[phab:T334910|T334910]]', diff saved to https://phabricator.wikimedia.org/P48220 and previous config saved to /var/cache/conftool/dbconfig/20230515-094938-ladsgroup.json | |||
* 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48219 and previous config saved to /var/cache/conftool/dbconfig/20230515-094317-ladsgroup.json | |||
* 09:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15802 | |||
* 09:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15802 | |||
* 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48218 and previous config saved to /var/cache/conftool/dbconfig/20230515-092810-ladsgroup.json | |||
* 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1020 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48217 and previous config saved to /var/cache/conftool/dbconfig/20230515-091139-ladsgroup.json | |||
* 09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance | |||
* 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance | |||
* 09:08 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync | |||
* 09:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync | |||
* 08:45 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 08:45 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 08:26 elukey: restart pybal on lvs2010 and lvs2009 to pick up new LVS VIP for ml-staging k8s ingress - [[phab:T335756|T335756]] | |||
* 08:26 volans: installed spicerack_7.1.0 on cumin1001 | |||
* 08:22 volans: installed spicerack_7.1.0 on cumin2002 | |||
* 08:08 volans: uploaded spicerack_7.1.0 to apt.wikimedia.org bullseye-wikimedia | |||
* 05:36 _joe_: building bookworm image for the first time [[phab:T335560|T335560]] | |||
== 2023-05-12 == | |||
* 22:59 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 22:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update cloudswift ip address - pt1979@cumin2002" | |||
* 22:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update cloudswift ip address - pt1979@cumin2002" | |||
* 22:32 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 22:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox | |||
* 22:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 21:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS buster | |||
* 21:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS buster | |||
* 20:08 mutante: gerrit1001 - systemctl mask gerrit [[phab:T326368|T326368]] | |||
* 18:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 18:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 18:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 18:08 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudswift1001'] | |||
* 17:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1001'] | |||
* 17:59 sukhe: running authdns-update for CR 919388 | |||
* 17:31 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys [[phab:T326767|T326767]] (duration: 150m 34s) | |||
* 17:27 sukhe: set routing-options static route 208.80.153.240/28 [high-traffic2, codfw] next-hop 10.192.16.140: [[phab:T326767|T326767]] | |||
* 17:21 sukhe: restart pybal on lvs2012 to pick up bgp med change: [[phab:T326767|T326767]] | |||
* 17:11 sukhe: homer "cr*-codfw*" commit "Gerrit: 917924 add new LVS host lvs2012": [[phab:T326767|T326767]] | |||
* 17:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 16:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage | |||
* 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance | |||
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance | |||
* 16:23 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage | |||
* 16:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:08 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:54 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye | |||
* 15:09 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:02 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns2001.wikimedia.org | |||
* 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 15:01 sukhe@cumin2002: START - Cookbook sre.dns.netbox | |||
* 15:01 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys [[phab:T326767|T326767]] | |||
* 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2001.wikimedia.org | |||
* 14:39 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 14:38 cdanis: silencing jobrunner/videoscaler probes for the weekend -- silence ID 21903b52-047b-43d9-94be-{{Gerrit|908a4b92b5a7}} | |||
* 14:38 sukhe@cumin2002: START - Cookbook sre.dns.netbox | |||
* 14:36 cdanis: silencing jobrunner/videoscaler probes for the weekend | |||
* 14:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns2001.wikimedia.wmnet | |||
* 14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2001.wikimedia.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 14:34 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2001.wikimedia.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||
* 14:29 sukhe@cumin2002: START - Cookbook sre.dns.netbox | |||
* 14:24 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2001.wikimedia.wmnet | |||
* 14:15 sukhe: [done] homer "cr*-codfw*" commit "Gerrit: 917364 remove to-be decommissioned host dns2001": [[phab:T335777|T335777]] | |||
* 14:13 sukhe: homer "cr*-codfw*" commit "Gerrit: 917364 remove to-be decommissioned host dns2001": [[phab:T335777|T335777]] | |||
* 13:54 sukhe: enable puppet and run agent in A:dns-rec: done deploying CR 919067 | |||
* 13:38 sukhe: disable puppet on A:dns-rec to merge CR 919067 | |||
* 13:22 sukhe: sudo cumin -b1 -s1200 'A:cp and A:eqiad' 'varnish-frontend-restart': [[phab:T253093|T253093]] | |||
* 13:06 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 13:06 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 12:46 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 12:45 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 12:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 12:26 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 11:58 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply | |||
* 11:56 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply | |||
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48215 and previous config saved to /var/cache/conftool/dbconfig/20230512-113514-root.json | |||
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48213 and previous config saved to /var/cache/conftool/dbconfig/20230512-112010-root.json | |||
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48212 and previous config saved to /var/cache/conftool/dbconfig/20230512-110505-root.json | |||
* 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48211 and previous config saved to /var/cache/conftool/dbconfig/20230512-105000-root.json | |||
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48210 and previous config saved to /var/cache/conftool/dbconfig/20230512-103455-root.json | |||
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48209 and previous config saved to /var/cache/conftool/dbconfig/20230512-101950-root.json | |||
* 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Maintenance | |||
* 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Maintenance | |||
* 10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance | |||
* 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance | |||
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48208 and previous config saved to /var/cache/conftool/dbconfig/20230512-100446-root.json | |||
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48206 and previous config saved to /var/cache/conftool/dbconfig/20230512-094941-root.json | |||
* 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2131.codfw.wmnet with reason: Maintenance | |||
* 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2131.codfw.wmnet with reason: Maintenance | |||
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P48205 and previous config saved to /var/cache/conftool/dbconfig/20230512-093950-root.json | |||
* 09:18 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs [[phab:T330214|T330214]] | |||
* 09:13 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1494.eqiad.wmnet | |||
* 09:13 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw146[7-9].eqiad.wmnet | |||
* 09:08 hashar@deploy1002: Finished scap: Backport for [[gerrit:919178{{!}}Reset the cached skin in RequestContext::setUser() (T336504)]] (duration: 16m 27s) | |||
* 08:54 hashar@deploy1002: hashar: Backport for [[gerrit:919178{{!}}Reset the cached skin in RequestContext::setUser() (T336504)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 08:52 hashar@deploy1002: Started scap: Backport for [[gerrit:919178{{!}}Reset the cached skin in RequestContext::setUser() (T336504)]] | |||
* 08:03 _joe_: restarting envoy on all jobrunners pooled in the jobrunner cluster [[phab:T336554|T336554]] | |||
* 08:00 _joe_: do it also on mw1438 | |||
* 07:59 _joe_: restaring envoyproxy on mw1439 to rebalance connections (see [[phab:T336554|T336554]]) | |||
* 07:57 taavi@deploy1002: Finished scap: Backport for [[gerrit:919269{{!}}Disable Graph (again) (T336556)]] (duration: 12m 29s) | |||
* 07:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335 | |||
* 07:46 taavi@deploy1002: taavi: Backport for [[gerrit:919269{{!}}Disable Graph (again) (T336556)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 07:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335 | |||
* 07:45 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox | |||
* 07:44 taavi@deploy1002: Started scap: Backport for [[gerrit:919269{{!}}Disable Graph (again) (T336556)]] | |||
* 07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20940 | |||
* 07:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20940 | |||
* 07:28 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox | |||
* 07:27 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary | |||
* 07:27 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary | |||
* 05:33 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1495.eqiad.wmnet | |||
* 05:32 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1466.eqiad.wmnet | |||
* 05:32 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1458.eqiad.wmnet | |||
* 05:31 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1461.eqiad.wmnet | |||
* 02:33 ejegg: payments-wiki upgraded from {{Gerrit|d1c5fefc}} to {{Gerrit|c0da741f}} | |||
* 02:32 ejegg: SmashPig upgraded from {{Gerrit|a9fa7a2c}} to {{Gerrit|5460dbe2}} | |||
* 01:08 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus6001.drmrs.wmnet | |||
* 01:08 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 01:08 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | |||
* 01:07 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | |||
* 01:01 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* 00:57 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus6001.drmrs.wmnet | |||
* 00:51 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus5001.eqsin.wmnet | |||
* 00:51 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 00:51 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | |||
* 00:50 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | |||
* 00:48 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* 00:44 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus5001.eqsin.wmnet | |||
* 00:32 denisse: manually removing prometheus4001.ulsfo.wmnet from the Ganeti master after a failed step in the decommission cookbook - [[phab:T335585|T335585]] | |||
* 00:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on prometheus3001.esams.wmnet with reason: maintenance | |||
* 00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on prometheus3001.esams.wmnet with reason: maintenance | |||
== 2023-05-11 == | |||
* 23:39 mutante: LDAP - added uid lorenjohnson to groups wmde nda [[phab:T335858|T335858]] | |||
* 23:39 mutante: LDAP - added uid roti to groups wmde and nda [[phab:T336435|T336435]] | |||
* 23:24 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw14(3[789]{{!}}4[056]{{!}}57)\.eqiad\.wmnet | |||
* 23:22 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw14(5[89]{{!}}6[016789]{{!}}9[45])\.eqiad\.wmnet | |||
* 23:22 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw14(3[789]{{!}}4[056]57)\.eqiad\.wmnet | |||
* 23:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudswift1002'] | |||
* 22:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1002'] | |||
* 22:41 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudswift1001'] | |||
* 22:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1001'] | |||
* 21:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 21:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 21:10 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad | |||
* 21:07 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad | |||
* 21:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts db1225.eqiad.wmnet | |||
* 21:07 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw | |||
* 21:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1225.eqiad.wmnet | |||
* 21:05 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw | |||
* 20:58 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:919175{{!}}Personalized praise: Do not suggest users with Homepage disabled (T336300)]], [[gerrit:919176{{!}}Personalized praise: Do not suggest users with Homepage disabled (T336300)]] (duration: 07m 30s) | |||
* 20:52 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:919175{{!}}Personalized praise: Do not suggest users with Homepage disabled (T336300)]], [[gerrit:919176{{!}}Personalized praise: Do not suggest users with Homepage disabled (T336300)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 20:51 urbanecm@deploy1002: Started scap: Backport for [[gerrit:919175{{!}}Personalized praise: Do not suggest users with Homepage disabled (T336300)]], [[gerrit:919176{{!}}Personalized praise: Do not suggest users with Homepage disabled (T336300)]] | |||
* 20:50 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:912310{{!}}[Growth] Remove config variables provided by extension]] (duration: 20m 04s) | |||
* 20:37 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus4001.ulsfo.wment | |||
* 20:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:36 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:32 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus4001.ulsfo.wment | |||
* 20:31 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:912310{{!}}[Growth] Remove config variables provided by extension]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 20:30 urbanecm@deploy1002: Started scap: Backport for [[gerrit:912310{{!}}[Growth] Remove config variables provided by extension]] | |||
* 20:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED | |||
* 20:22 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:919168{{!}}Allow http://localhost callback URL (T299737)]] (duration: 09m 37s) | |||
* 20:22 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus3001.esams.wment | |||
* 20:22 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 20:21 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3001.esams.wment decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | |||
* 20:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3001.esams.wment decommissioned, removing all IPs except the asset tag one - denisse@cumin1001" | |||
* 20:18 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* 20:17 denisse: manually remove prometheus3001.esams.wmnet from the ganeti master after a failed step in the decommission cookbook. | |||
* 20:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3001.esams.wment | |||
* 20:14 thcipriani@deploy1002: bd808 and thcipriani: Backport for [[gerrit:919168{{!}}Allow http://localhost callback URL (T299737)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 20:12 thcipriani@deploy1002: Started scap: Backport for [[gerrit:919168{{!}}Allow http://localhost callback URL (T299737)]] | |||
* 19:56 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus3001.esams.wment | |||
* 19:56 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 19:55 denisse@cumin1001: START - Cookbook sre.dns.netbox | |||
* 19:51 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3001.esams.wment | |||
* 19:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye | |||
* 19:06 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye | |||
* 18:46 ejegg: civicrm upgraded from {{Gerrit|d8a1a562}} to {{Gerrit|db6e8d69}} | |||
* 17:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-airflow1006.eqiad.wmnet with reason: Silence error notifications/alerts during setup | |||
* 17:46 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-airflow1006.eqiad.wmnet with reason: Silence error notifications/alerts during setup | |||
* 17:24 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync | |||
* 17:12 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.8 refs [[phab:T330214|T330214]] (duration: 06m 14s) | |||
* 17:12 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync | |||
* 17:11 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 17:10 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 17:08 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 17:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 17:06 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 17:06 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 17:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.8 refs [[phab:T330214|T330214]] | |||
* 17:05 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 17:01 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply | |||
* 17:00 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply | |||
* 16:58 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. | |||
* 16:58 bking@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'. | |||
* 16:57 bking@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 16:56 bking@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 16:56 bking@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 16:56 bking@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 16:56 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 16:56 bking@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 16:55 bking@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 16:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 16:54 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 16:50 hashar: CI / Zuul was slow to report build results back to Gerrit most probably due to lack of IPv6 ([[phab:T336524|T336524]]) which should be solved now. | |||
* 16:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 16:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48203 and previous config saved to /var/cache/conftool/dbconfig/20230511-164125-ladsgroup.json | |||
* 16:37 brennen: train 1.41.0-wmf.8 ([[phab:T330214|T330214]]): rolling back to group1 to test for [[phab:T336504|T336504]] presence/absence on enwiki | |||
* 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020', diff saved to https://phabricator.wikimedia.org/P48201 and previous config saved to /var/cache/conftool/dbconfig/20230511-162619-ladsgroup.json | |||
* 16:16 elukey: benthos webrequest live instances migrated to kafka-franz (new consumer client, data may have some holes) - [[phab:T331801|T331801]] | |||
* 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020', diff saved to https://phabricator.wikimedia.org/P48200 and previous config saved to /var/cache/conftool/dbconfig/20230511-161113-ladsgroup.json | |||
* 16:08 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2002.wikimedia.org with OS bullseye | |||
* 16:01 Amir1: Removing db1110 from zarcillo [[phab:T335011|T335011]] | |||
* 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1110.eqiad.wmnet | |||
* 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1110.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | |||
* 15:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1110.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | |||
* 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48199 and previous config saved to /var/cache/conftool/dbconfig/20230511-155607-ladsgroup.json | |||
* 15:49 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox | |||
* 15:48 hashar: CI back up and fully operation (after the Gerrit upgrade) | |||
* 15:48 mutante: gerrit maintenance period ended - gerrit switched to new hardware, IP and distro version | |||
* 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2020 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48198 and previous config saved to /var/cache/conftool/dbconfig/20230511-154533-ladsgroup.json | |||
* 15:45 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage | |||
* 15:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2020.codfw.wmnet with reason: Maintenance | |||
* 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2020.codfw.wmnet with reason: Maintenance | |||
* 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1110.eqiad.wmnet | |||
* 15:42 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage | |||
* 15:27 sukhe: [done] running homer for CR 919151: resolve connection issues to gerrit.wikimedia.org | |||
* 15:27 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bullseye | |||
* 15:21 sukhe: running homer for CR 919151: resolve connection issues to gerrit.wikimedia.org | |||
* 15:18 urandom: altering image_suggestions schema (generated data platform) — [[phab:T336424|T336424]] | |||
* 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48197 and previous config saved to /var/cache/conftool/dbconfig/20230511-144959-ladsgroup.json | |||
* 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024', diff saved to https://phabricator.wikimedia.org/P48195 and previous config saved to /var/cache/conftool/dbconfig/20230511-143453-ladsgroup.json | |||
* 14:27 moritzm: installing avahi security updates | |||
* 14:26 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2012 | |||
* 14:26 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2012 | |||
* 14:25 bking@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'. | |||
* 14:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: maintenance | |||
* 14:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: maintenance | |||
* 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024', diff saved to https://phabricator.wikimedia.org/P48194 and previous config saved to /var/cache/conftool/dbconfig/20230511-141947-ladsgroup.json | |||
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: maintenance | |||
* 14:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: maintenance | |||
* 14:15 bking@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'. | |||
* 14:15 bking@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 14:15 sukhe: sudo cumin -b1 -s1200 'A:cp and A:codfw' 'varnish-frontend-restart': [[phab:T253093|T253093]] | |||
* 14:11 bking@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. | |||
* 14:09 bking@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. | |||
* 14:08 bking@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. | |||
* 14:08 thcipriani: starting Gerrit Switchover (Take II): The Reckoning | |||
* 14:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48192 and previous config saved to /var/cache/conftool/dbconfig/20230511-140440-ladsgroup.json | |||
* 13:57 elukey: upgrade benthos (4.9.1 -> 4.15.0) on centrallog nodes - [[phab:T331801|T331801]] | |||
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2024 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48191 and previous config saved to /var/cache/conftool/dbconfig/20230511-135335-ladsgroup.json | |||
* 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance | |||
* 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance | |||
* 13:49 moritzm: uploaded wmf-laptop 0.5.7 to component/wmf-sre-laptop | |||
* 13:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye | |||
* 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage | |||
* 13:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage | |||
* 13:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 13:22 elukey: upload benthos 4.15.0-1 to <nowiki>{</nowiki>buster,bullseye<nowiki>}</nowiki>-wikimedia - [[phab:T331801|T331801]] | |||
* 13:13 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye | |||
* 13:07 filippo@cumin1001: conftool action : set/weight=100; selector: name=thanos-fe2004.codfw.wmnet | |||
* 13:07 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2004.codfw.wmnet,service=thanos-web | |||
* 13:07 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.codfw.wmnet | |||
* 13:07 filippo@cumin1001: conftool action : set/pooled=true; selector: name=thanos-fe2004.eqiad.wmnet | |||
* 13:06 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.eqiad.wmnet | |||
* 13:06 filippo@cumin1001: conftool action : set/weight=100; selector: name=thanos-fe2004.eqiad.wmnet | |||
* 13:05 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1004.eqiad.wmnet,service=thanos-web | |||
* 13:05 filippo@cumin1001: conftool action : set/weight=100; selector: name=thanos-fe1004.eqiad.wmnet | |||
* 12:58 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:919051{{!}}Add outreachwiki to wikidataclient dblist (T171140)]] (duration: 11m 05s) | |||
* 12:54 godog: roll-restart thanos-fe swift-proxy to apply config changes - [[phab:T336348|T336348]] | |||
* 12:48 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:919051{{!}}Add outreachwiki to wikidataclient dblist (T171140)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet | |||
* 12:47 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:919051{{!}}Add outreachwiki to wikidataclient dblist (T171140)]] | |||
* 12:41 Amir1: creating wikidata client tables for outreachwiki ([[phab:T171140|T171140]]) | |||
* 12:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2002.wikimedia.org with OS bullseye | |||
* 12:01 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade | |||
* 11:57 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage | |||
* 11:54 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage | |||
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48190 and previous config saved to /var/cache/conftool/dbconfig/20230511-115201-root.json | |||
* 11:39 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bullseye | |||
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48189 and previous config saved to /var/cache/conftool/dbconfig/20230511-113657-root.json | |||
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48187 and previous config saved to /var/cache/conftool/dbconfig/20230511-112152-root.json | |||
* 11:08 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade | |||
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48186 and previous config saved to /var/cache/conftool/dbconfig/20230511-110647-root.json | |||
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48185 and previous config saved to /var/cache/conftool/dbconfig/20230511-105142-root.json | |||
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48184 and previous config saved to /var/cache/conftool/dbconfig/20230511-103638-root.json | |||
* 10:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 10:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48183 and previous config saved to /var/cache/conftool/dbconfig/20230511-102133-root.json | |||
* 10:17 moritzm: installing modsecurity-crs security updates | |||
* 10:10 moritzm: installing protobuf security updates | |||
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48182 and previous config saved to /var/cache/conftool/dbconfig/20230511-100628-root.json | |||
* 09:35 moritzm: installing distro-info-data updates on buster | |||
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137', diff saved to https://phabricator.wikimedia.org/P48181 and previous config saved to /var/cache/conftool/dbconfig/20230511-092848-root.json | |||
* 08:59 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade | |||
* 08:56 jelto@cumin1001: END (ERROR) - Cookbook sre.gitlab.upgrade (exit_code=97) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade | |||
* 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev | |||
* 08:40 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade | |||
* 08:40 elukey: `apt-get clean` on orespoolcounter nodes to free space in the root partition | |||
* 08:33 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade | |||
* 08:13 moritzm: installing Linux 4.19.282 updates on Buster systems | |||
* 08:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs [[phab:T330214|T330214]] | |||
* 08:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev | |||
* 08:05 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade | |||
* 07:43 jmm@cumin2002: END (FAIL) - Cookbook sre.cassandra.roll-reboot (exit_code=1) rolling reboot on A:cassandra-dev | |||
* 07:43 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev | |||
* 07:41 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade | |||
* 07:39 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 2518 | |||
* 07:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518 | |||
* 07:14 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 20940 | |||
* 07:13 marostegui@deploy1002: Finished scap: Backport for [[gerrit:918535{{!}}Revert "ProductionServices.php: Failover pc2 eqiad master"]] (duration: 07m 41s) | |||
* 07:07 marostegui@deploy1002: marostegui: Backport for [[gerrit:918535{{!}}Revert "ProductionServices.php: Failover pc2 eqiad master"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 07:05 marostegui@deploy1002: Started scap: Backport for [[gerrit:918535{{!}}Revert "ProductionServices.php: Failover pc2 eqiad master"]] | |||
* 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2002.wikimedia.org | |||
* 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2002.wikimedia.org | |||
* 06:42 marostegui@deploy1002: Finished scap: Backport for [[gerrit:918923{{!}}ProductionServices.php: Failover pc2 eqiad master]] (duration: 08m 23s) | |||
* 06:36 marostegui@deploy1002: marostegui: Backport for [[gerrit:918923{{!}}ProductionServices.php: Failover pc2 eqiad master]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 06:34 marostegui@deploy1002: Started scap: Backport for [[gerrit:918923{{!}}ProductionServices.php: Failover pc2 eqiad master]] | |||
* 06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20940 | |||
* 06:29 marostegui@deploy1002: Finished scap: Backport for [[gerrit:918534{{!}}Revert "ProductionServices.php: Failover pc2 codfw master"]] (duration: 08m 12s) | |||
* 06:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17676 | |||
* 06:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 17676 | |||
* 06:22 marostegui@deploy1002: marostegui: Backport for [[gerrit:918534{{!}}Revert "ProductionServices.php: Failover pc2 codfw master"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet | |||
* 06:21 marostegui@deploy1002: Started scap: Backport for [[gerrit:918534{{!}}Revert "ProductionServices.php: Failover pc2 codfw master"]] | |||
* 06:21 XioNoX: Configure/reconfigure 1:1 NAT for new fr-tech hosts (frbast2002, frmon2002) - [[phab:T336450|T336450]] | |||
* 06:15 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 13335 | |||
* 06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335 | |||
* 06:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 714 | |||
* 06:07 marostegui@deploy1002: Finished scap: Backport for [[gerrit:918903{{!}}ProductionServices.php: Failover pc2 codfw master]] (duration: 07m 42s) | |||
* 06:05 kart_: Updated MinT to 2023-05-11-051736-production | |||
* 06:01 marostegui@deploy1002: marostegui: Backport for [[gerrit:918903{{!}}ProductionServices.php: Failover pc2 codfw master]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet | |||
* 06:00 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:59 marostegui@deploy1002: Started scap: Backport for [[gerrit:918903{{!}}ProductionServices.php: Failover pc2 codfw master]] | |||
* 05:58 marostegui@deploy1002: marostegui: Backport for [[gerrit:918903{{!}}ProductionServices.php: Failover pc2 codfw master]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet | |||
* 05:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 714 | |||
* 05:57 marostegui@deploy1002: Started scap: Backport for [[gerrit:918903{{!}}ProductionServices.php: Failover pc2 codfw master]] | |||
* 05:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 05:55 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:53 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 05:48 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: [[phab:T335396|T335396]] | |||
* 05:48 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: [[phab:T335396|T335396]] | |||
* 05:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
== 2023-05-10 == | |||
* 22:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2021.codfw.wmnet with OS buster | |||
* 21:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage | |||
* 21:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage | |||
* 21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster | |||
* 21:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS buster | |||
* 21:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster | |||
* 20:58 ejegg: payments-wiki upgraded from {{Gerrit|2125cea7}} to {{Gerrit|d1c5fefc}} | |||
* 20:58 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye | |||
* 20:55 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@02d6ac9]: (no justification provided) (duration: 00m 11s) | |||
* 20:55 milimetric@deploy1002: Started deploy [airflow-dags/analytics@02d6ac9]: (no justification provided) | |||
* 20:33 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: Gerrit to 3.5.6 on gerrit1003 {{!}} [[phab:T336339|T336339]] (duration: 00m 06s) | |||
* 20:33 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: Gerrit to 3.5.6 on gerrit1003 {{!}} [[phab:T336339|T336339]] | |||
* 20:32 cjming: end of UTC late backport window | |||
* 20:21 cjming@deploy1002: Finished scap: Backport for [[gerrit:918531{{!}}Remove unnecessary jQuery closure (T324913)]] (duration: 09m 02s) | |||
* 20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48177 and previous config saved to /var/cache/conftool/dbconfig/20230510-202014-ladsgroup.json | |||
* 20:14 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:918531{{!}}Remove unnecessary jQuery closure (T324913)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 20:12 cjming@deploy1002: Started scap: Backport for [[gerrit:918531{{!}}Remove unnecessary jQuery closure (T324913)]] | |||
* 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P48176 and previous config saved to /var/cache/conftool/dbconfig/20230510-200508-ladsgroup.json | |||
* 20:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye | |||
* 20:00 milimetric@deploy1002: Finished deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172] (duration: 00m 05s) | |||
* 20:00 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172] | |||
* 20:00 milimetric@deploy1002: Finished deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172] (duration: 00m 26s) | |||
* 19:59 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172] | |||
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P48175 and previous config saved to /var/cache/conftool/dbconfig/20230510-195001-ladsgroup.json | |||
* 19:47 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=codfw | |||
* 19:35 milimetric@deploy1002: Finished deploy [analytics/refinery@4ccc172]: Regular analytics weekly train [analytics/refinery@4ccc172] (duration: 40m 28s) | |||
* 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48174 and previous config saved to /var/cache/conftool/dbconfig/20230510-193455-ladsgroup.json | |||
* 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48173 and previous config saved to /var/cache/conftool/dbconfig/20230510-192746-ladsgroup.json | |||
* 19:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance | |||
* 19:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance | |||
* 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48172 and previous config saved to /var/cache/conftool/dbconfig/20230510-192722-ladsgroup.json | |||
* 19:25 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001 | |||
* 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P48171 and previous config saved to /var/cache/conftool/dbconfig/20230510-191216-ladsgroup.json | |||
* 19:08 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001 | |||
* 19:00 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001 | |||
* 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P48170 and previous config saved to /var/cache/conftool/dbconfig/20230510-185710-ladsgroup.json | |||
* 18:54 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172]: Regular analytics weekly train [analytics/refinery@4ccc172] | |||
* 18:45 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys [[phab:T326767|T326767]] (duration: 191m 53s) | |||
* 18:43 ejegg: payments-wiki upgraded from {{Gerrit|ec5a5e92}} to {{Gerrit|2125cea7}} | |||
* 18:43 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001 | |||
* 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48169 and previous config saved to /var/cache/conftool/dbconfig/20230510-184202-ladsgroup.json | |||
* 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48168 and previous config saved to /var/cache/conftool/dbconfig/20230510-183441-ladsgroup.json | |||
* 18:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance | |||
* 18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance | |||
* 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48167 and previous config saved to /var/cache/conftool/dbconfig/20230510-183418-ladsgroup.json | |||
* 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P48166 and previous config saved to /var/cache/conftool/dbconfig/20230510-181912-ladsgroup.json | |||
* 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P48165 and previous config saved to /var/cache/conftool/dbconfig/20230510-180406-ladsgroup.json | |||
* 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48164 and previous config saved to /var/cache/conftool/dbconfig/20230510-174859-ladsgroup.json | |||
* 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48163 and previous config saved to /var/cache/conftool/dbconfig/20230510-174143-ladsgroup.json | |||
* 17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance | |||
* 17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance | |||
* 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48162 and previous config saved to /var/cache/conftool/dbconfig/20230510-174119-ladsgroup.json | |||
* 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P48161 and previous config saved to /var/cache/conftool/dbconfig/20230510-172613-ladsgroup.json | |||
* 17:23 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage | |||
* 17:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage | |||
* 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P48160 and previous config saved to /var/cache/conftool/dbconfig/20230510-171107-ladsgroup.json | |||
* 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48159 and previous config saved to /var/cache/conftool/dbconfig/20230510-165601-ladsgroup.json | |||
* 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:50 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48158 and previous config saved to /var/cache/conftool/dbconfig/20230510-164842-ladsgroup.json | |||
* 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance | |||
* 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance | |||
* 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48157 and previous config saved to /var/cache/conftool/dbconfig/20230510-164818-ladsgroup.json | |||
* 16:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P48156 and previous config saved to /var/cache/conftool/dbconfig/20230510-163312-ladsgroup.json | |||
* 16:31 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:25 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P48155 and previous config saved to /var/cache/conftool/dbconfig/20230510-161806-ladsgroup.json | |||
* 16:15 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage | |||
* 16:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage | |||
* 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48154 and previous config saved to /var/cache/conftool/dbconfig/20230510-160258-ladsgroup.json | |||
* 16:02 sukhe: sudo cumin -b1 -s1200 'A:cp and A:drmrs' 'varnish-frontend-restart': [[phab:T253093|T253093]] | |||
* 15:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main | |||
* 15:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main | |||
* 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48153 and previous config saved to /var/cache/conftool/dbconfig/20230510-155429-ladsgroup.json | |||
* 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance | |||
* 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance | |||
* 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance | |||
* 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance | |||
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48152 and previous config saved to /var/cache/conftool/dbconfig/20230510-155357-ladsgroup.json | |||
* 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P48151 and previous config saved to /var/cache/conftool/dbconfig/20230510-153851-ladsgroup.json | |||
* 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 15:33 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys [[phab:T326767|T326767]] | |||
* 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P48150 and previous config saved to /var/cache/conftool/dbconfig/20230510-152345-ladsgroup.json | |||
* 15:17 sukhe: running authdns-update for CR 918527 | |||
* 15:16 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 15:16 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 15:14 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 15:14 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 15:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 15:12 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48149 and previous config saved to /var/cache/conftool/dbconfig/20230510-150838-ladsgroup.json | |||
* 15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48148 and previous config saved to /var/cache/conftool/dbconfig/20230510-150009-ladsgroup.json | |||
* 15:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance | |||
* 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance | |||
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48147 and previous config saved to /var/cache/conftool/dbconfig/20230510-145946-ladsgroup.json | |||
* 14:58 cwhite: install vopsbot 0.3.4 on alert2001 [[phab:T329791|T329791]] | |||
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P48146 and previous config saved to /var/cache/conftool/dbconfig/20230510-144440-ladsgroup.json | |||
* 14:44 moritzm: restarting FPM/Apache on mw canaries to pick up libxml2 updates | |||
* 14:41 moritzm: installing libxml2 security updates on buster | |||
* 14:40 thcipriani: stopping gerrit on gerrit1001 | |||
* 14:40 thcipriani: stopping gerrit on gerrit1003 | |||
* 14:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: migration | |||
* 14:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: migration | |||
* 14:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: migration | |||
* 14:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: migration | |||
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P48145 and previous config saved to /var/cache/conftool/dbconfig/20230510-142934-ladsgroup.json | |||
* 14:26 thcipriani: gerrit1003 switchover happening | |||
* 14:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 14:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48144 and previous config saved to /var/cache/conftool/dbconfig/20230510-141427-ladsgroup.json | |||
* 14:08 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox | |||
* 14:08 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox | |||
* 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48143 and previous config saved to /var/cache/conftool/dbconfig/20230510-140708-ladsgroup.json | |||
* 14:07 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary | |||
* 14:07 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary | |||
* 14:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance | |||
* 14:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance | |||
* 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48142 and previous config saved to /var/cache/conftool/dbconfig/20230510-140644-ladsgroup.json | |||
* 14:02 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . | |||
* 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P48140 and previous config saved to /var/cache/conftool/dbconfig/20230510-135138-ladsgroup.json | |||
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P48139 and previous config saved to /var/cache/conftool/dbconfig/20230510-133632-ladsgroup.json | |||
* 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48138 and previous config saved to /var/cache/conftool/dbconfig/20230510-132126-ladsgroup.json | |||
* 13:19 taavi@deploy1002: Finished scap: Backport for [[gerrit:917415{{!}}[arwikisource] Replace the current logo with an identical HD version (T336193)]] (duration: 08m 00s) | |||
* 13:15 _joe_: rolling back vopsbot to 0.3.3 | |||
* 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48137 and previous config saved to /var/cache/conftool/dbconfig/20230510-131412-ladsgroup.json | |||
* 13:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance | |||
* 13:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance | |||
* 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48136 and previous config saved to /var/cache/conftool/dbconfig/20230510-131347-ladsgroup.json | |||
* 13:13 taavi@deploy1002: superpes and taavi: Backport for [[gerrit:917415{{!}}[arwikisource] Replace the current logo with an identical HD version (T336193)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet | |||
* 13:11 taavi@deploy1002: Started scap: Backport for [[gerrit:917415{{!}}[arwikisource] Replace the current logo with an identical HD version (T336193)]] | |||
* 13:06 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet | |||
* 13:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet | |||
* 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P48135 and previous config saved to /var/cache/conftool/dbconfig/20230510-125840-ladsgroup.json | |||
* 12:56 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2002.wikimedia.org | |||
* 12:52 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org | |||
* 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P48134 and previous config saved to /var/cache/conftool/dbconfig/20230510-124334-ladsgroup.json | |||
* 12:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 12:30 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 12:29 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 12:29 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48133 and previous config saved to /var/cache/conftool/dbconfig/20230510-122828-ladsgroup.json | |||
* 12:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 12:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48132 and previous config saved to /var/cache/conftool/dbconfig/20230510-122316-ladsgroup.json | |||
* 12:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance | |||
* 12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance | |||
* 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48131 and previous config saved to /var/cache/conftool/dbconfig/20230510-122253-ladsgroup.json | |||
* 12:13 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 12:13 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 12:12 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 12:11 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 12:10 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 12:10 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P48129 and previous config saved to /var/cache/conftool/dbconfig/20230510-120747-ladsgroup.json | |||
* 11:58 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 11:58 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 11:57 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 11:57 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 11:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 11:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P48128 and previous config saved to /var/cache/conftool/dbconfig/20230510-115241-ladsgroup.json | |||
* 11:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply | |||
* 11:49 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply | |||
* 11:46 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply | |||
* 11:46 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply | |||
* 11:43 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply | |||
* 11:43 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply | |||
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2005.codfw.wmnet with OS bookworm | |||
* 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48127 and previous config saved to /var/cache/conftool/dbconfig/20230510-113734-ladsgroup.json | |||
* 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48126 and previous config saved to /var/cache/conftool/dbconfig/20230510-113215-ladsgroup.json | |||
* 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance | |||
* 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance | |||
* 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage | |||
* 11:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance | |||
* 11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance | |||
* 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48125 and previous config saved to /var/cache/conftool/dbconfig/20230510-112855-ladsgroup.json | |||
* 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage | |||
* 11:18 _joe_: installing vopsbot 0.3.4 on alert1001 [[phab:T329791|T329791]] | |||
* 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P48124 and previous config saved to /var/cache/conftool/dbconfig/20230510-111349-ladsgroup.json | |||
* 11:11 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm | |||
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P48123 and previous config saved to /var/cache/conftool/dbconfig/20230510-105843-ladsgroup.json | |||
* 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48122 and previous config saved to /var/cache/conftool/dbconfig/20230510-104337-ladsgroup.json | |||
* 10:38 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . | |||
* 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48121 and previous config saved to /var/cache/conftool/dbconfig/20230510-103712-ladsgroup.json | |||
* 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance | |||
* 10:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance | |||
* 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48120 and previous config saved to /var/cache/conftool/dbconfig/20230510-103649-ladsgroup.json | |||
* 10:26 Amir1: Removing db1113 from zarcillo [[phab:T336029|T336029]] | |||
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48119 and previous config saved to /var/cache/conftool/dbconfig/20230510-102302-ladsgroup.json | |||
* 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P48118 and previous config saved to /var/cache/conftool/dbconfig/20230510-102142-ladsgroup.json | |||
* 10:21 Amir1: start of clean up of echo notification in wikidatawiki ([[phab:T318523|T318523]]) | |||
* 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1113.eqiad.wmnet | |||
* 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | |||
* 10:16 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001" | |||
* 10:13 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox | |||
* 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1113.eqiad.wmnet | |||
* 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P48117 and previous config saved to /var/cache/conftool/dbconfig/20230510-100756-ladsgroup.json | |||
* 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P48116 and previous config saved to /var/cache/conftool/dbconfig/20230510-100636-ladsgroup.json | |||
* 10:01 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe2004.codfw.wmnet | |||
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48115 and previous config saved to /var/cache/conftool/dbconfig/20230510-095309-root.json | |||
* 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P48114 and previous config saved to /var/cache/conftool/dbconfig/20230510-095250-ladsgroup.json | |||
* 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48113 and previous config saved to /var/cache/conftool/dbconfig/20230510-095130-ladsgroup.json | |||
* 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2004.codfw.wmnet | |||
* 09:50 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe1004.eqiad.wmnet | |||
* 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48112 and previous config saved to /var/cache/conftool/dbconfig/20230510-094452-ladsgroup.json | |||
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance | |||
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance | |||
* 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48111 and previous config saved to /var/cache/conftool/dbconfig/20230510-094429-ladsgroup.json | |||
* 09:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1004.eqiad.wmnet | |||
* 09:38 daniel@deploy1002: Finished scap: Backport for [[gerrit:918388{{!}}Enable parser cache warming jobs for parsoid on medium wikis (T329366)]] (duration: 08m 10s) | |||
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48110 and previous config saved to /var/cache/conftool/dbconfig/20230510-093804-root.json | |||
* 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48109 and previous config saved to /var/cache/conftool/dbconfig/20230510-093743-ladsgroup.json | |||
* 09:31 daniel@deploy1002: daniel: Backport for [[gerrit:918388{{!}}Enable parser cache warming jobs for parsoid on medium wikis (T329366)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet | |||
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48108 and previous config saved to /var/cache/conftool/dbconfig/20230510-093128-root.json | |||
* 09:30 daniel@deploy1002: Started scap: Backport for [[gerrit:918388{{!}}Enable parser cache warming jobs for parsoid on medium wikis (T329366)]] | |||
* 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P48107 and previous config saved to /var/cache/conftool/dbconfig/20230510-092923-ladsgroup.json | |||
* 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1220 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48106 and previous config saved to /var/cache/conftool/dbconfig/20230510-092531-ladsgroup.json | |||
* 09:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance | |||
* 09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance | |||
* 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48105 and previous config saved to /var/cache/conftool/dbconfig/20230510-092507-ladsgroup.json | |||
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48104 and previous config saved to /var/cache/conftool/dbconfig/20230510-092259-root.json | |||
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48103 and previous config saved to /var/cache/conftool/dbconfig/20230510-091624-root.json | |||
* 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P48102 and previous config saved to /var/cache/conftool/dbconfig/20230510-091417-ladsgroup.json | |||
* 09:12 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet | |||
* 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P48101 and previous config saved to /var/cache/conftool/dbconfig/20230510-091001-ladsgroup.json | |||
* 09:09 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage | |||
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48100 and previous config saved to /var/cache/conftool/dbconfig/20230510-090755-root.json | |||
* 09:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage | |||
* 09:01 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet | |||
* 09:01 hashar: Gerrit restarted at version 3.5.6 {{!}} [[phab:T336339|T336339]] | |||
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48099 and previous config saved to /var/cache/conftool/dbconfig/20230510-090119-root.json | |||
* 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48098 and previous config saved to /var/cache/conftool/dbconfig/20230510-085910-ladsgroup.json | |||
* 08:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 {{!}} [[phab:T336339|T336339]] (duration: 00m 05s) | |||
* 08:57 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 {{!}} [[phab:T336339|T336339]] | |||
* 08:56 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 {{!}} [[phab:T336339|T336339]] (duration: 00m 09s) | |||
* 08:56 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 {{!}} [[phab:T336339|T336339]] | |||
* 08:55 hashar: Stopping Gerrit for 3.5.5 > 3.5.6 upgrade [[phab:T336339|T336339]] | |||
* 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P48097 and previous config saved to /var/cache/conftool/dbconfig/20230510-085455-ladsgroup.json | |||
* 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48096 and previous config saved to /var/cache/conftool/dbconfig/20230510-085330-ladsgroup.json | |||
* 08:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance | |||
* 08:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance | |||
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48095 and previous config saved to /var/cache/conftool/dbconfig/20230510-085250-root.json | |||
* 08:51 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 08:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit2002 {{!}} [[phab:T336339|T336339]] (duration: 00m 07s) | |||
* 08:49 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit2002 {{!}} [[phab:T336339|T336339]] | |||
* 08:48 hashar: deploy1002: git reset `/srv/deployment/gerrit/gerrit` which had bunch of locally modified files for some reason # [[phab:T336339|T336339]] | |||
* 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance | |||
* 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance | |||
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48094 and previous config saved to /var/cache/conftool/dbconfig/20230510-084614-root.json | |||
* 08:40 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox | |||
* 08:40 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox | |||
* 08:39 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary | |||
* 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48093 and previous config saved to /var/cache/conftool/dbconfig/20230510-083948-ladsgroup.json | |||
* 08:39 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary | |||
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48092 and previous config saved to /var/cache/conftool/dbconfig/20230510-083745-root.json | |||
* 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48091 and previous config saved to /var/cache/conftool/dbconfig/20230510-083253-ladsgroup.json | |||
* 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance | |||
* 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance | |||
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48090 and previous config saved to /var/cache/conftool/dbconfig/20230510-083109-root.json | |||
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48089 and previous config saved to /var/cache/conftool/dbconfig/20230510-082240-root.json | |||
* 08:21 hashar@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.8 refs [[phab:T330214|T330214]] (duration: 05m 55s) | |||
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48088 and previous config saved to /var/cache/conftool/dbconfig/20230510-081605-root.json | |||
* 08:15 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.8 refs [[phab:T330214|T330214]] | |||
* 08:14 godog: re-enable eqsin remote syslog towards centrallog - [[phab:T336345|T336345]] | |||
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48087 and previous config saved to /var/cache/conftool/dbconfig/20230510-080736-root.json | |||
* 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox | |||
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48086 and previous config saved to /var/cache/conftool/dbconfig/20230510-080100-root.json | |||
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48085 and previous config saved to /var/cache/conftool/dbconfig/20230510-080003-root.json | |||
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48084 and previous config saved to /var/cache/conftool/dbconfig/20230510-075957-root.json | |||
* 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors | |||
* 07:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors | |||
* 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host testvm2005.codfw.wmnet with OS bookworm | |||
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48083 and previous config saved to /var/cache/conftool/dbconfig/20230510-074555-root.json | |||
* 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors | |||
* 07:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors | |||
* 07:45 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox | |||
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48082 and previous config saved to /var/cache/conftool/dbconfig/20230510-074458-root.json | |||
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48081 and previous config saved to /var/cache/conftool/dbconfig/20230510-074452-root.json | |||
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2117 [[phab:T334650|T334650]]', diff saved to https://phabricator.wikimedia.org/P48080 and previous config saved to /var/cache/conftool/dbconfig/20230510-074237-root.json | |||
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48079 and previous config saved to /var/cache/conftool/dbconfig/20230510-073833-root.json | |||
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48078 and previous config saved to /var/cache/conftool/dbconfig/20230510-072954-root.json | |||
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48077 and previous config saved to /var/cache/conftool/dbconfig/20230510-072948-root.json | |||
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48076 and previous config saved to /var/cache/conftool/dbconfig/20230510-072329-root.json | |||
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48075 and previous config saved to /var/cache/conftool/dbconfig/20230510-071449-root.json | |||
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48074 and previous config saved to /var/cache/conftool/dbconfig/20230510-071443-root.json | |||
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48073 and previous config saved to /var/cache/conftool/dbconfig/20230510-070824-root.json | |||
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48072 and previous config saved to /var/cache/conftool/dbconfig/20230510-065944-root.json | |||
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48071 and previous config saved to /var/cache/conftool/dbconfig/20230510-065938-root.json | |||
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48070 and previous config saved to /var/cache/conftool/dbconfig/20230510-065319-root.json | |||
* 06:52 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm | |||
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48069 and previous config saved to /var/cache/conftool/dbconfig/20230510-064439-root.json | |||
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48068 and previous config saved to /var/cache/conftool/dbconfig/20230510-064433-root.json | |||
* 06:44 marostegui: dbmaint eqiad failover s3 sanitarium master [[phab:T336252|T336252]] | |||
* 06:41 marostegui@cumin2002: dbctl commit (dc=all): 'Depool db1112 db1212 [[phab:T336252|T336252]]', diff saved to https://phabricator.wikimedia.org/P48067 and previous config saved to /var/cache/conftool/dbconfig/20230510-064119-marostegui.json | |||
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48066 and previous config saved to /var/cache/conftool/dbconfig/20230510-063814-root.json | |||
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48065 and previous config saved to /var/cache/conftool/dbconfig/20230510-062309-root.json | |||
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48064 and previous config saved to /var/cache/conftool/dbconfig/20230510-060805-root.json | |||
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P48063 and previous config saved to /var/cache/conftool/dbconfig/20230510-060656-root.json | |||
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48062 and previous config saved to /var/cache/conftool/dbconfig/20230510-055929-root.json | |||
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48061 and previous config saved to /var/cache/conftool/dbconfig/20230510-055300-root.json | |||
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2151', diff saved to https://phabricator.wikimedia.org/P48060 and previous config saved to /var/cache/conftool/dbconfig/20230510-054833-root.json | |||
* 05:42 kart_: Updated MinT to 2023-05-10-045734-production ([[phab:T331505|T331505]]) | |||
* 05:42 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:37 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply | |||
* 05:35 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:32 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply | |||
* 05:28 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply | |||
* 05:26 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply | |||
* 04:10 mutante: gerrit1001 - rsyncing data over to gerrit1003, as root in a screen, but slowly with bwlimit 5m | |||
* 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" | |||
== 2023-05-09 == | |||
* 23:43 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad | |||
* 23:25 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad | |||
* 23:22 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad | |||
* 23:02 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad | |||
* 23:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw | |||
* 22:46 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" | |||
* 22:42 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw | |||
* 22:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw | |||
* 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48058 and previous config saved to /var/cache/conftool/dbconfig/20230509-223346-ladsgroup.json | |||
* 22:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage | |||
* 22:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage | |||
* 22:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P48057 and previous config saved to /var/cache/conftool/dbconfig/20230509-221840-ladsgroup.json | |||
* 22:18 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw | |||
* 22:06 inflatador: bking@wcqs1002 depool wcqs1002 while it catches up on lag | |||
* 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P48056 and previous config saved to /var/cache/conftool/dbconfig/20230509-220333-ladsgroup.json | |||
* 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48055 and previous config saved to /var/cache/conftool/dbconfig/20230509-214827-ladsgroup.json | |||
* 21:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 21:42 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams | |||
* 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48054 and previous config saved to /var/cache/conftool/dbconfig/20230509-213834-ladsgroup.json | |||
* 21:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance | |||
* 21:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance | |||
* 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48053 and previous config saved to /var/cache/conftool/dbconfig/20230509-213808-ladsgroup.json | |||
* 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P48052 and previous config saved to /var/cache/conftool/dbconfig/20230509-212302-ladsgroup.json | |||
* 21:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams | |||
* 21:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams | |||
* 21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P48051 and previous config saved to /var/cache/conftool/dbconfig/20230509-210755-ladsgroup.json | |||
* 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48050 and previous config saved to /var/cache/conftool/dbconfig/20230509-205249-ladsgroup.json | |||
* 20:52 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams | |||
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2096 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48049 and previous config saved to /var/cache/conftool/dbconfig/20230509-204604-ladsgroup.json | |||
* 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance | |||
* 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance | |||
* 20:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 20:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 20:42 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs | |||
* 20:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 20:31 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:917733{{!}}Add padding to limited-width toggle to account for close icon (T336274)]], [[gerrit:917734{{!}}Add padding to limited-width toggle to account for close icon (T336274)]] (duration: 08m 59s) | |||
* 20:24 urbanecm@deploy1002: urbanecm and jdrewniak: Backport for [[gerrit:917733{{!}}Add padding to limited-width toggle to account for close icon (T336274)]], [[gerrit:917734{{!}}Add padding to limited-width toggle to account for close icon (T336274)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet | |||
* 20:22 urbanecm@deploy1002: Started scap: Backport for [[gerrit:917733{{!}}Add padding to limited-width toggle to account for close icon (T336274)]], [[gerrit:917734{{!}}Add padding to limited-width toggle to account for close icon (T336274)]] | |||
* 20:22 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:908337{{!}}Remove unused parsoidSettings, nativeGalleryEnabled]] (duration: 07m 11s) | |||
* 20:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs | |||
* 20:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs | |||
* 20:14 urbanecm@deploy1002: Started scap: Backport for [[gerrit:908337{{!}}Remove unused parsoidSettings, nativeGalleryEnabled]] | |||
* 20:10 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:917925{{!}}[Growth] Add mediawiki.mentor_dashboard.personalized_praise stream]] (duration: 07m 26s) | |||
* 20:03 urbanecm@deploy1002: Started scap: Backport for [[gerrit:917925{{!}}[Growth] Add mediawiki.mentor_dashboard.personalized_praise stream]] | |||
* 20:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 19:54 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs | |||
* 19:34 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin | |||
* 19:08 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:08 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply | |||
* 19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin | |||
* 18:57 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin | |||
* 18:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 18:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 18:28 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin | |||
* 18:06 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo | |||
* 18:01 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host vrts2001.codfw.wmnet with OS bullseye | |||
* 17:49 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply | |||
* 17:49 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage | |||
* 17:49 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply | |||
* 17:49 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply | |||
* 17:48 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply | |||
* 17:48 rzl@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync | |||
* 17:48 rzl@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync | |||
* 17:48 rzl@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync | |||
* 17:48 rzl@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync | |||
* 17:48 rzl@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync | |||
* 17:48 rzl@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync | |||
* 17:48 rzl@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync | |||
* 17:48 rzl@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync | |||
* 17:47 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply | |||
* 17:47 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo | |||
* 17:47 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply | |||
* 17:47 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply | |||
* 17:46 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply | |||
* 17:46 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply | |||
* 17:46 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage | |||
* 17:46 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply | |||
* 17:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply | |||
* 17:45 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply | |||
* 17:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply | |||
* 17:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply | |||
* 17:42 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply | |||
* 17:42 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply | |||
* 17:31 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts2001.codfw.wmnet with OS bullseye | |||
* 17:31 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye | |||
* 17:31 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye | |||
* 17:28 aokoth@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host vrts2001.codfw.wmnet with OS bullseye | |||
* 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48048 and previous config saved to /var/cache/conftool/dbconfig/20230509-172826-ladsgroup.json | |||
* 17:20 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo | |||
* 17:17 rzl: rolling restart apache on eqiad appservers [[phab:T225778|T225778]] | |||
* 17:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2012.codfw.wmnet with OS bullseye | |||
* 17:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" | |||
* 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P48047 and previous config saved to /var/cache/conftool/dbconfig/20230509-171320-ladsgroup.json | |||
* 17:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" | |||
* 17:11 rzl: rolling restart apache on codfw appservers [[phab:T225778|T225778]] | |||
* 17:00 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo | |||
* 17:00 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-text_ulsfo | |||
* 17:00 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo | |||
* 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P48046 and previous config saved to /var/cache/conftool/dbconfig/20230509-165813-ladsgroup.json | |||
* 16:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage | |||
* 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye | |||
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002" | |||
* 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage | |||
* 16:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002" | |||
* 16:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox | |||
* 16:46 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup2001-dev.codfw.wmnet with OS bullseye | |||
* 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48045 and previous config saved to /var/cache/conftool/dbconfig/20230509-164307-ladsgroup.json | |||
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48044 and previous config saved to /var/cache/conftool/dbconfig/20230509-163646-ladsgroup.json | |||
* 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance | |||
* 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance | |||
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48043 and previous config saved to /var/cache/conftool/dbconfig/20230509-163621-ladsgroup.json | |||
* 16:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye | |||
* 16:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudbackup2001-dev.codfw.wmnet with OS bullseye | |||
* 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002" | |||
* 16:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2012'] | |||
* 16:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002" | |||
* 16:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox | |||
* 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48042 and previous config saved to /var/cache/conftool/dbconfig/20230509-162904-ladsgroup.json | |||
* 16:27 rzl: resumed puppet on appservers - [[phab:T225778|T225778]] | |||
* 16:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2012'] | |||
* 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2012'] | |||
* 16:23 rzl: rzl@mwdebug1001:~$ sudo apache2ctl restart | |||
* 16:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2012'] | |||
* 16:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1002.eqiad.wmnet | |||
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P48041 and previous config saved to /var/cache/conftool/dbconfig/20230509-162115-ladsgroup.json | |||
* 16:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply | |||
* 16:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply | |||
* 16:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1002.eqiad.wmnet | |||
* 16:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply | |||
* 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48039 and previous config saved to /var/cache/conftool/dbconfig/20230509-161358-ladsgroup.json | |||
* 16:11 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply | |||
* 16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply | |||
* 16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply | |||
* 16:09 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. | |||
* 16:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. | |||
* 16:08 jnuche@deploy1002: Installing scap version "4.52.1" for 593 hosts | |||
* 16:07 rzl: stopping puppet on appservers - [[phab:T225778|T225778]] | |||
* 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2012.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P48038 and previous config saved to /var/cache/conftool/dbconfig/20230509-160608-ladsgroup.json | |||
* 16:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage | |||
* 16:01 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage | |||
* 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48037 and previous config saved to /var/cache/conftool/dbconfig/20230509-155852-ladsgroup.json | |||
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48036 and previous config saved to /var/cache/conftool/dbconfig/20230509-155102-ladsgroup.json | |||
* 15:50 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts2001.codfw.wmnet with OS bullseye | |||
* 15:48 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye | |||
* 15:48 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye | |||
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48035 and previous config saved to /var/cache/conftool/dbconfig/20230509-154346-ladsgroup.json | |||
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48034 and previous config saved to /var/cache/conftool/dbconfig/20230509-154338-ladsgroup.json | |||
* 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance | |||
* 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002" | |||
* 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance | |||
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48033 and previous config saved to /var/cache/conftool/dbconfig/20230509-154313-ladsgroup.json | |||
* 15:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002" | |||
* 15:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox | |||
* 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48032 and previous config saved to /var/cache/conftool/dbconfig/20230509-153715-ladsgroup.json | |||
* 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance | |||
* 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance | |||
* 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48031 and previous config saved to /var/cache/conftool/dbconfig/20230509-153651-ladsgroup.json | |||
* 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P48030 and previous config saved to /var/cache/conftool/dbconfig/20230509-152804-ladsgroup.json | |||
* 15:23 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 15:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2012.mgmt.codfw.wmnet with reboot policy FORCED | |||
* 15:22 aborrero@cumin2002: START - Cookbook sre.dns.netbox | |||
* 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48029 and previous config saved to /var/cache/conftool/dbconfig/20230509-152145-ladsgroup.json | |||
* 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2012 - pt1979@cumin2002" | |||
* 15:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2012 - pt1979@cumin2002" | |||
* 15:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox | |||
* 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P48028 and previous config saved to /var/cache/conftool/dbconfig/20230509-151258-ladsgroup.json | |||
* 15:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host testvm2005.codfw.wmnet with OS bookworm | |||
* 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48027 and previous config saved to /var/cache/conftool/dbconfig/20230509-150639-ladsgroup.json | |||
* 15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2180'] | |||
* 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48026 and previous config saved to /var/cache/conftool/dbconfig/20230509-145752-ladsgroup.json | |||
* 14:54 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys [[phab:T326767|T326767]] (duration: 45m 45s) | |||
* 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48025 and previous config saved to /var/cache/conftool/dbconfig/20230509-145133-ladsgroup.json | |||
* 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48024 and previous config saved to /var/cache/conftool/dbconfig/20230509-145128-ladsgroup.json | |||
* 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | |||
* 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance | |||
* 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance | |||
* 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance | |||
* 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T335845|T335845]])', diff saved to https://phabricator.wikimedia.org/P48023 and previous config saved to /var/cache/conftool/dbconfig/20230509-145057-ladsgroup.json | |||
* 14:50 sukhe: homer "cr*-codfw*" commit "Gerrit: 917885 remove decommissioned host lvs2008" | |||
* 14:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2180'] | |||
* 14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2008.codfw.wmnet | |||
* 14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | |||
* 14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002" | |||