You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 01m 36s))
imported>Stashbot
(zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in group1 wikis (T299954) (duration: 08m 00s))
 
(711 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2021-04-04 ==
== 2023-05-30 ==
* 14:47 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 01m 36s)
* 23:38 zabe@deploy1002: Finished scap: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]] (duration: 08m 00s)
* 14:45 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch
* 23:31 zabe@deploy1002: zabe: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 23:30 zabe@deploy1002: Started scap: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]]
* 22:22 ejegg: civicrm upgraded from {{Gerrit|415aa7e5}} to {{Gerrit|5905a403}}
* 21:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] (duration: 07m 48s)
* 21:50 samtar@deploy1002: jforrester and samtar: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 21:48 samtar@deploy1002: Started scap: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]]
* 20:58 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:924569{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:57 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:924569{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]]
* 20:40 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:924568{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]] (duration: 09m 27s)
* 20:32 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:924568{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:30 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:924568{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]]
* 20:12 inflatador: bking@wdqs2009 depool wdqs2009 until it catches up with lag
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:924536{{!}}Turn on A/B Test Hebrew (T336969)]] (duration: 08m 46s)
* 20:03 samtar@deploy1002: ksarabia and samtar: Backport for [[gerrit:924536{{!}}Turn on A/B Test Hebrew (T336969)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:01 samtar@deploy1002: Started scap: Backport for [[gerrit:924536{{!}}Turn on A/B Test Hebrew (T336969)]]
* 19:48 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. [[phab:T335305|T335305]]. (duration: 00m 09s)
* 19:48 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. [[phab:T335305|T335305]].
* 19:36 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 04m 02s)
* 19:32 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
* 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 54s)
* 19:29 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
* 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 16m 36s)
* 19:24 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
* 19:12 inflatador: [WDQS Deploy] Deploying version 0.3.124
* 19:11 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
* 18:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]]
* 17:45 mutante: re-enabling puppet on contint2001
* 16:20 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyEdited
* 16:19 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyRegistered
* 16:14 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:924053{{!}}[Growth] Enable user impact refresh on 10 more wikis (T336203)]] (duration: 07m 08s)
* 16:07 urbanecm@deploy1002: Started scap: Backport for [[gerrit:924053{{!}}[Growth] Enable user impact refresh on 10 more wikis (T336203)]]
* 16:00 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:00 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:58 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:58 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:57 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:56 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:56 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:54 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2002.codfw.wmnet with OS bullseye
* 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:49 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:49 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:15 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
* 15:15 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 15:14 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 15:10 tgr_: UTC evening deploys done
* 15:08 tgr@deploy1002: Finished scap: Backport for [[gerrit:924160{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]] (duration: 08m 08s)
* 15:05 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 15:03 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 15:02 tgr@deploy1002: tgr and matmarex: Backport for [[gerrit:924160{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 15:00 tgr@deploy1002: Started scap: Backport for [[gerrit:924160{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]]
* 14:50 moritzm: installing texlive-bin security updates
* 14:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
* 14:46 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
* 14:36 tgr@deploy1002: Finished scap: Backport for [[gerrit:924159{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]] (duration: 08m 01s)
* 14:29 tgr@deploy1002: matmarex and tgr: Backport for [[gerrit:924159{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:28 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
* 14:27 tgr@deploy1002: Started scap: Backport for [[gerrit:924159{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]]
* 14:16 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mwlog2002.codfw.wmnet with OS bullseye
* 14:16 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetdb1003.eqiad.wmnet with OS bookworm
* 14:14 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:13 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:08 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:06 moritzm: installing libwebp security updates
* 14:06 tgr@deploy1002: Finished scap: Backport for [[gerrit:924158{{!}}editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]] (duration: 08m 14s)
* 13:59 tgr@deploy1002: tgr and matmarex: Backport for [[gerrit:924158{{!}}editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:58 tgr@deploy1002: Started scap: Backport for [[gerrit:924158{{!}}editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]]
* 13:57 tgr@deploy1002: Finished scap: Backport for [[gerrit:924488{{!}}prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]] (duration: 16m 13s)
* 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2009.codfw.wmnet
* 13:55 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
* 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS bullseye
* 13:42 tgr@deploy1002: tgr and daimona: Backport for [[gerrit:924488{{!}}prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:40 tgr@deploy1002: Started scap: Backport for [[gerrit:924488{{!}}prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]]
* 13:33 mlitn@deploy1002: Finished scap: Backport for [[gerrit:924454{{!}}Fix maxJobs default]], [[gerrit:924455{{!}}Fix maxJobs default]] (duration: 07m 39s)
* 13:27 mlitn@deploy1002: mlitn: Backport for [[gerrit:924454{{!}}Fix maxJobs default]], [[gerrit:924455{{!}}Fix maxJobs default]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:25 mlitn@deploy1002: Started scap: Backport for [[gerrit:924454{{!}}Fix maxJobs default]], [[gerrit:924455{{!}}Fix maxJobs default]]
* 13:20 tgr@deploy1002: Finished scap: Backport for [[gerrit:924079{{!}}GrowthExperiments: Re-add $wgGERestbaseUrl]] (duration: 09m 26s)
* 13:13 tgr@deploy1002: tgr: Backport for [[gerrit:924079{{!}}GrowthExperiments: Re-add $wgGERestbaseUrl]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:11 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
* 13:11 tgr@deploy1002: Started scap: Backport for [[gerrit:924079{{!}}GrowthExperiments: Re-add $wgGERestbaseUrl]]
* 13:09 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
* 13:09 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 13:09 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 13:08 bblack: lvs1018: restart pybal for wikireplicas monitoring removal
* 13:08 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 13:06 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 13:06 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
* 13:06 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 13:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 13:03 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
* 13:00 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
* 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:48 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS bullseye
* 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:29 volans: disablig puppet where cadvisor is present
* 12:14 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
* 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
* 11:50 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
* 11:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 11:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 11:46 slyngshede@cumin1001: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 11:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
* 11:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
* 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:14 hashar@deploy1002: Finished deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - [[phab:T331651|T331651]] (duration: 00m 08s)
* 11:14 hashar@deploy1002: Started deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - [[phab:T331651|T331651]]
* 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2006.codfw.wmnet with OS bookworm
* 11:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 10:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
* 10:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 10:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 10:50 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
* 10:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 10:41 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 10:11 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard2003.codfw.wmnet with OS bookworm
* 10:11 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard1003.eqiad.wmnet with OS bookworm
* 10:00 zabe@deploy1002: Finished scap: Backport for [[gerrit:924469{{!}}Start reading from rev_comment_id in group0 wikis (T299954)]] (duration: 08m 12s)
* 09:59 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
* 09:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
* 09:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
* 09:54 zabe@deploy1002: zabe: Backport for [[gerrit:924469{{!}}Start reading from rev_comment_id in group0 wikis (T299954)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 09:52 zabe@deploy1002: Started scap: Backport for [[gerrit:924469{{!}}Start reading from rev_comment_id in group0 wikis (T299954)]]
* 09:52 zabe@deploy1002: Finished scap: Backport for [[gerrit:923635{{!}}Check for null when using ::getCheckUserHelperFieldset (T337599)]] (duration: 09m 52s)
* 09:49 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
* 09:46 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
* 09:43 zabe@deploy1002: zabe: Backport for [[gerrit:923635{{!}}Check for null when using ::getCheckUserHelperFieldset (T337599)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:43 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bookworm
* 09:42 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:42 zabe@deploy1002: Started scap: Backport for [[gerrit:923635{{!}}Check for null when using ::getCheckUserHelperFieldset (T337599)]]
* 09:40 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 09:37 zabe@deploy1002: Finished scap: Backport for [[gerrit:922492{{!}}Start reading from rev_comment_id in test wikis (T299954)]] (duration: 07m 48s)
* 09:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
* 09:33 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:33 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:32 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
* 09:30 zabe@deploy1002: zabe: Backport for [[gerrit:922492{{!}}Start reading from rev_comment_id in test wikis (T299954)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:30 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:30 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb2003.codfw.wmnet with OS bookworm
* 09:29 zabe@deploy1002: Started scap: Backport for [[gerrit:922492{{!}}Start reading from rev_comment_id in test wikis (T299954)]]
* 09:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:24 tgr@deploy1002: Finished scap: Backport for [[gerrit:924361{{!}}Improve handling of missing image recommendation]] (duration: 08m 57s)
* 09:22 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
* 09:20 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
* 09:19 arturo: run aborrero@cumin1001:~ 2s 98 $ sudo cumin "P<nowiki>{</nowiki>R:Profile::Mariadb::Section = 's7'<nowiki>}</nowiki> and P<nowiki>{</nowiki>P:wmcs::db::wikireplicas::mariadb_multiinstance<nowiki>}</nowiki>" "/usr/local/sbin/maintain-meta_p --all-databases --bootstrap"
* 09:17 tgr@deploy1002: tgr: Backport for [[gerrit:924361{{!}}Improve handling of missing image recommendation]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 09:15 tgr@deploy1002: Started scap: Backport for [[gerrit:924361{{!}}Improve handling of missing image recommendation]]
* 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:14 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:13 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:11 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:11 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 09:06 tgr@deploy1002: Finished scap: Backport for [[gerrit:923644{{!}}Section images: Do not treat unexpected kinds as production errors]] (duration: 14m 22s)
* 09:00 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:00 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:59 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:54 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:53 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:53 tgr@deploy1002: tgr: Backport for [[gerrit:923644{{!}}Section images: Do not treat unexpected kinds as production errors]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 08:52 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:51 tgr@deploy1002: Started scap: Backport for [[gerrit:923644{{!}}Section images: Do not treat unexpected kinds as production errors]]
* 08:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:50 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:49 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:49 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:48 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:43 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:41 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:39 tgr@deploy1002: Finished scap: Backport for [[gerrit:923643{{!}}Improve logging of invalid image recommendation kinds]] (duration: 10m 30s)
* 08:39 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:39 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:36 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:36 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:35 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:34 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:33 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:31 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:31 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:30 tgr@deploy1002: tgr: Backport for [[gerrit:923643{{!}}Improve logging of invalid image recommendation kinds]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:29 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 08:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:29 tgr@deploy1002: Started scap: Backport for [[gerrit:923643{{!}}Improve logging of invalid image recommendation kinds]]
* 08:29 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:27 jayme: re-enable puppet on P:kubernetes::node for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
* 08:25 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:20 jayme: disable puppet on P:kubernetes::node (apart from staging-codfw) for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
* 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:15 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:14 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:12 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:12 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
* 08:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
* 08:08 tgr@deploy1002: Finished scap: Backport for [[gerrit:924356{{!}}Section images: Accept more recommendation types]] (duration: 07m 51s)
* 08:01 tgr@deploy1002: tgr: Backport for [[gerrit:924356{{!}}Section images: Accept more recommendation types]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:00 tgr@deploy1002: Started scap: Backport for [[gerrit:924356{{!}}Section images: Accept more recommendation types]]
* 07:56 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:924086{{!}}Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]] (duration: 09m 17s)
* 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb2003.codfw.wmnet with OS bookworm
* 07:48 ladsgroup@deploy1002: func and ladsgroup: Backport for [[gerrit:924086{{!}}Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:46 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:924086{{!}}Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]]
* 07:45 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:45 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48633 and previous config saved to /var/cache/conftool/dbconfig/20230530-074445-root.json
* 07:44 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:42 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:41 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:40 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:38 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:38 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 07:31 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:31 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:30 moritzm: move LDAP permissions for hghani from cn=nda to cn=wmf [[phab:T322145|T322145]]
* 07:30 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48632 and previous config saved to /var/cache/conftool/dbconfig/20230530-072941-root.json
* 07:29 kartik@deploy1002: Finished scap: Backport for [[gerrit:924050{{!}}testwiki: Enable Section Translation for 9 Wikipedia (T337290)]] (duration: 09m 38s)
* 07:28 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:21 kartik@deploy1002: kartik: Backport for [[gerrit:924050{{!}}testwiki: Enable Section Translation for 9 Wikipedia (T337290)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:19 kartik@deploy1002: Started scap: Backport for [[gerrit:924050{{!}}testwiki: Enable Section Translation for 9 Wikipedia (T337290)]]
* 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:17 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:16 kartik@deploy1002: Finished scap: Backport for [[gerrit:923527{{!}}Undeploy Special:Contribute from unsupported skins (T337366)]] (duration: 11m 49s)
* 07:16 moritzm: update bookworm installer to rc4 [[phab:T330495|T330495]]
* 07:16 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48630 and previous config saved to /var/cache/conftool/dbconfig/20230530-071436-root.json
* 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:10 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 07:10 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:09 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:06 kartik@deploy1002: kartik: Backport for [[gerrit:923527{{!}}Undeploy Special:Contribute from unsupported skins (T337366)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 07:06 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:04 kartik@deploy1002: Started scap: Backport for [[gerrit:923527{{!}}Undeploy Special:Contribute from unsupported skins (T337366)]]
* 07:04 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:03 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 07:02 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:02 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:01 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48629 and previous config saved to /var/cache/conftool/dbconfig/20230530-065932-root.json
* 06:58 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 06:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 06:51 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:50 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:48 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 06:48 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48628 and previous config saved to /var/cache/conftool/dbconfig/20230530-064427-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48625 and previous config saved to /var/cache/conftool/dbconfig/20230530-062922-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48624 and previous config saved to /var/cache/conftool/dbconfig/20230530-061417-root.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48623 and previous config saved to /var/cache/conftool/dbconfig/20230530-055913-root.json
* 05:43 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 62597
* 05:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
* 05:41 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 1255 hosts
* 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 1255 hosts
* 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 784 hosts
* 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 784 hosts
* 05:28 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 784 hosts
* 05:27 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 784 hosts
* 05:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 1255 hosts
* 05:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 1255 hosts
* 05:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62597
* 05:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
* 04:28 kart_: Updated cxserver to 2023-05-29-112644-production ([[phab:T337657|T337657]])
* 04:28 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 04:27 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 04:24 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 04:24 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 04:21 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 04:20 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.9 (duration: 02m 10s)
* 03:52 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]] (duration: 49m 54s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]]


== 2021-04-03 ==
== 2023-05-29 ==
* 19:20 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 11s)
* 15:19 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
* 19:18 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch
* 15:19 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
* 17:30 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 35s)
* 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 17:26 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
* 14:18 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 16:44 elukey: power reset for ms-be2028 - not reachable via ssh, no tty available via mgmt console, NMI unrecoverable errors logged in iLo's system logs
* 13:57 vgutierrez@puppetmaster1001: conftool action : set/weight=10; selector: name=dbproxy.*,dc=eqiad
* 15:35 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 18s)
* 11:25 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:33 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
* 11:24 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 15:12 andrew@deploy1002: Finished deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch (duration: 11m 51s)
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48618 and previous config saved to /var/cache/conftool/dbconfig/20230529-112242-root.json
* 15:00 andrew@deploy1002: Started deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch
* 11:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 05:38 andrew@deploy1002: Finished deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 05s)
* 11:13 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 05:35 andrew@deploy1002: Started deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48617 and previous config saved to /var/cache/conftool/dbconfig/20230529-110737-root.json
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48616 and previous config saved to /var/cache/conftool/dbconfig/20230529-105233-root.json
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48615 and previous config saved to /var/cache/conftool/dbconfig/20230529-103728-root.json
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48614 and previous config saved to /var/cache/conftool/dbconfig/20230529-102223-root.json
* 10:07 vgutierrez: restarting pybal on lvs1018
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48612 and previous config saved to /var/cache/conftool/dbconfig/20230529-100719-root.json
* 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 10:05 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 10:05 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 10:04 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 10:04 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 10:03 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 10:03 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 10:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 10:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 10:00 vgutierrez: restarting pybal on lvs1020
* 09:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 09:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 09:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 09:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48611 and previous config saved to /var/cache/conftool/dbconfig/20230529-095214-root.json
* 09:52 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 09:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 09:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 09:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 09:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 09:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48610 and previous config saved to /var/cache/conftool/dbconfig/20230529-093709-root.json
* 09:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 09:31 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 09:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 09:29 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 09:13 godog: start partial rollout of cadvisor to eqiad/codfw (~10%) [[phab:T108027|T108027]]
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48609 and previous config saved to /var/cache/conftool/dbconfig/20230529-090216-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48608 and previous config saved to /var/cache/conftool/dbconfig/20230529-084711-root.json
* 08:45 godog: delete old raw blocks from thanos - [[phab:T337236|T337236]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48607 and previous config saved to /var/cache/conftool/dbconfig/20230529-083206-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48606 and previous config saved to /var/cache/conftool/dbconfig/20230529-081702-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48604 and previous config saved to /var/cache/conftool/dbconfig/20230529-080157-root.json
* 07:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48603 and previous config saved to /var/cache/conftool/dbconfig/20230529-074653-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48602 and previous config saved to /var/cache/conftool/dbconfig/20230529-073148-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48601 and previous config saved to /var/cache/conftool/dbconfig/20230529-071643-root.json
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s2, s3, s5 [[phab:T337446|T337446]]', diff saved to https://phabricator.wikimedia.org/P48598 and previous config saved to /var/cache/conftool/dbconfig/20230529-051043-root.json


== 2021-04-02 ==
== 2023-05-28 ==
* 22:31 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 13:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 22:31 bstorm@cumin1001: Added views for new wiki: trvwiki [[phab:T276246|T276246]]
* 13:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 22:08 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 13:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
* 22:08 mutante: pooled mw2395,mw2396 as API appservers running on new hardware
* 13:16 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
* 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[5-6].codfw.wmnet
* 06:12 marostegui: Change innodb_fast_shutdown to 0 on db1154 before downgrading [[phab:T337446|T337446]]
* 21:58 legoktm: legoktm@lists1002:~$ time sudo mailman-web rebuild_index
* 21:56 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[5-6].codfw.wmnet
* 21:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw239[5-6].codfw.wmnet
* 21:48 mutante: mw2395, mw2396 - reboot - becoming API servers
* 21:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[0-4].codfw.wmnet
* 21:42 mutante: pooled 12 brand-new codfw appservers running on new hardware generation
* 21:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw238[5-9].codfw.wmnet
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2384.codfw.wmnet
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2383.codfw.wmnet
* 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
* 21:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
* 21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
* 21:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
* 21:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
* 21:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[0-4].codfw.wmnet
* 21:34 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw238[3-9].codfw.wmnet
* 21:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
* 21:28 legoktm: imported python-xapian-haystack 2.1.0-6~wmf1 on apt1001 ([[phab:T278717|T278717]])
* 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2394.codfw.wmnet
* 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2393.codfw.wmnet
* 21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2392.codfw.wmnet
* 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2391.codfw.wmnet
* 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2390.codfw.wmnet
* 21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2389.codfw.wmnet
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2388.codfw.wmnet
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2387.codfw.wmnet
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2386.codfw.wmnet
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2385.codfw.wmnet
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2384.codfw.wmnet
* 21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2383.codfw.wmnet
* 21:19 mutante: generating mcrouter certs for mw2395 through mw2404  ([[phab:T278396|T278396]])
* 21:07 mutante: mw2383 through mw2394 - 'uptime && scap pull' via ssh -C (not cumin because it needs to run as non-root)
* 20:58 mutante: mw238* - scap pull via cumin not possible because it doesnt work as root
* 20:50 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: tweak to affinity group options (duration: 03m 39s)
* 20:46 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: tweak to affinity group options
* 20:44 mutante: mw2385 through mw2394 - serial rebooting
* 20:43 mutante: mw2384 reboot
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: new_install
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: new_install
* 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev (duration: 01m 47s)
* 20:39 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev
* 20:09 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 20:09 bstorm@cumin1001: Added views for new wiki: taywiki [[phab:T275836|T275836]]
* 19:47 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
* 19:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
* 19:07 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 19:07 bstorm@cumin1001: Added views for new wiki: mnwwiktionary [[phab:T276126|T276126]]
* 18:44 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:44 mutante: [puppetmaster1001:~] $ sudo puppet node deactivate mw2247.codfw.wmnet
* 18:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2247.codfw.wmnet
* 18:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2247.codfw.wmnet
* 17:57 legoktm: upgraded mailman3 python3-django-postorius on lists1002
* 15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 15:41 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:35 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw133[7-8].eqiad.wmnet
* 14:34 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=videoscaler,name=mw133[5-6].eqiad.wmnet
* 14:32 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw133[5-6].eqiad.wmnet
* 14:31 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw133[7-8].eqiad.wmnet
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
* 14:29 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1111.eqiad.wmnet
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
* 14:20 Urbanecm: Start server-side upload for 3 video files ([[phab:T279060|T279060]], [[phab:T279061|T279061]], [[phab:T279062|T279062]])
* 14:09 Urbanecm: Start server-side upload for 3 video files ([[phab:T279138|T279138]], [[phab:T279137|T279137]], [[phab:T279136|T279136]])
* 13:42 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.37
* 13:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
* 13:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
* 13:11 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/load.php: [[phab:T278579|T278579]] (duration: 00m 58s)
* 13:10 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/OutputHandler.php: [[phab:T278579|T278579]] (duration: 00m 57s)
* 13:08 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/MediaWiki.php: [[phab:T278579|T278579]] (duration: 00m 58s)
* 11:46 Urbanecm: correction: Start server-side upload for 3 video files ([[phab:T279079|T279079]], [[phab:T279080|T279080]], [[phab:T279104|T279104]])
* 11:45 Urbanecm: Start server-side upload for 3 images ([[phab:T279079|T279079]], [[phab:T279080|T279080]], [[phab:T279104|T279104]])
* 10:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
* 10:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
* 10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback group0 wikis to 1.36.0-wmf.36 - [[phab:T278343|T278343]]
* 09:45 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 and group2 wikis to 1.36.0-wmf.36 - [[phab:T278343|T278343]]
* 09:44 hashar@deploy1002: sync-wikiversions aborted: Revert group1 and group2 wikis to 1.36.0-wmf.36 (duration: 00m 01s)
* 09:06 dcausse: remove dumps from wdqs1009 to free disk space
* 07:33 effie: powercycle an-worker1080
* 07:28 elukey: manual fix for an-worker1080's interface in netbox (xe-4/0/11), moved by mistake to public-1b
* 03:54 dwisehaupt: replication user on fundraising db set to require ssl for connections at the mysql user level. db updated on frdb1004 and verified on a set of hosts
* 03:16 dwisehaupt: replication user on payments db set to require ssl for connections at the mysql user level. db updated on payments1001 and verified on a set of hosts


== 2021-04-01 ==
== 2023-05-27 ==
* 23:32 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: [[gerrit:676350{{!}}Revert "Turn on glent m1 AB test"]] [[phab:T262612|T262612]] (duration: 00m 58s)
* 21:40 Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 ([[phab:T337446|T337446]])
* 23:28 thcipriani: reset /srv/mediawiki-staging/php-1.36.0-wmf.37/extensions/TimedMediaHandler to {{Gerrit|1be781d}} (HEAD of wmf/1.36.0-wmf.37 -- from HEAD of master 49f417)
* 17:42 godog: silence systemd state alert flapping on stat1009 until monday
* 23:12 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Backport: Part III [[gerrit:676451{{!}}Add hi-res version of mediawiki.org logos]] [[phab:T268230|T268230]] (duration: 00m 57s)
* 00:03 tzatziki: removing 1 file for legal compliance
* 23:10 thcipriani@deploy1002: Synchronized logos: Backport: Part II [[gerrit:676451{{!}}Add hi-res version of mediawiki.org logos]] [[phab:T268230|T268230]] (duration: 00m 57s)
* 23:08 thcipriani@deploy1002: Synchronized static: Backport: Part I [[gerrit:676451{{!}}Add hi-res version of mediawiki.org logos]] [[phab:T268230|T268230]] (duration: 00m 59s)
* 22:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2248.codfw.wmnet
* 22:50 twentyafterfour@deploy1002: Finished deploy [releng/phatality@27ddd0b]: deploy phatality (duration: 00m 13s)
* 22:50 twentyafterfour@deploy1002: Started deploy [releng/phatality@27ddd0b]: deploy phatality
* 22:49 twentyafterfour: deploying phatality
* 22:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2248.codfw.wmnet
* 22:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2247.codfw.wmnet
* 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2247.codfw.wmnet
* 22:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2246.codfw.wmnet
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2246.codfw.wmnet
* 21:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2243.codfw.wmnet
* 21:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2243.codfw.wmnet
* 20:42 mutante: mw2243, mw2246, mw2247, mw2248 - depooled - replaced by mw2379, mw2380, mw2381, mw2382 ( [[phab:T277780|T277780]])
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2248.codfw.wmnet
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2247.codfw.wmnet
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2246.codfw.wmnet
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2243.codfw.wmnet
* 20:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2382.codfw.wmnet
* 20:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2381.codfw.wmnet
* 20:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
* 20:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2379.codfw.wmnet
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2379.codfw.wmnet with reason: new_install
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: new_install
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2380.codfw.wmnet with reason: new_install
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: new_install
* 20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2382.codfw.wmnet with reason: new_install
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2382.codfw.wmnet with reason: new_install
* 20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2381.codfw.wmnet with reason: new_install
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2381.codfw.wmnet with reason: new_install
* 20:01 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1 (duration: 00m 04s)
* 20:01 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1
* 20:01 razzi@deploy1002: deploy aborted: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1hv (duration: 00m 00s)
* 20:01 mutante: mw2379, mw2380, mw2381, mw2382 - scap pull
* 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2382.codfw.wmnet
* 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2381.codfw.wmnet
* 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 19:59 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1 (duration: 00m 21s)
* 19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2379.codfw.wmnet
* 19:58 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1
* 19:57 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:57 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 19:56 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1 (duration: 00m 12s)
* 19:56 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset {{Gerrit|fd7c9eb71e193}}, released after 1.0.1
* 19:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 19:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:51 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 19:37 mutante: pooled parse2001 again after twentyaftefour rebuilt the l10n cache for wmf.37 which fixed it and made Apache alert recover ([[phab:T268524|T268524]])
* 19:34 mutante: mw2379, mw2380, mw2381, mw2382 - rebooting
* 19:34 twentyafterfour@deploy1002: scap sync-l10n completed (1.36.0-wmf.37) (duration: 02m 38s)
* 19:30 mutante: depooled parse2001 because on train deployment it caused "MWException: No localisation cache found for English" and then "HTTP CRITICAL: HTTP/1.1 500 Internal Server Error" ([[phab:T268524|T268524]])
* 19:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
* 19:28 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2001.codfw.wmnet
* 19:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
* 19:21 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 18:59 mutante: creating mcrouter certs for mw2379 thorugh mw2382
* 18:35 Urbanecm: Morning B&C window done
* 18:33 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo/resources/mediasearch-vue/components/base/Dialog.vue: {{Gerrit|e77f2b98a4fcb7d9cf74c45caeb7cfbc68a063d0}}: Use appendChild() instead of append() ([[phab:T278448|T278448]]) (duration: 01m 09s)
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b485d1ca6779a03912345a094fa1101cef5f091a}}: Enable SandboxLink extension in ptwikinews ([[phab:T278634|T278634]]) (duration: 01m 12s)
* 17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: REIMAGE
* 17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: REIMAGE
* 17:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:59 Urbanecm: Start server-side upload of two files ([[phab:T279082|T279082]], [[phab:T279081|T279081]])
* 16:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1007.eqiad.wmnet
* 16:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a7acf3357d5d148bad11a2d2718b4da56e1a0cb8}}: hrwiki: Fix help panel links ([[phab:T275684|T275684]]) (duration: 01m 10s)
* 16:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2396.codfw.wmnet with reason: REIMAGE
* 16:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2396.codfw.wmnet with reason: REIMAGE
* 16:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
* 16:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
* 15:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
* 15:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
* 15:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2393.codfw.wmnet with reason: REIMAGE
* 15:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2393.codfw.wmnet with reason: REIMAGE
* 15:32 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2391.codfw.wmnet with reason: REIMAGE
* 15:30 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2391.codfw.wmnet with reason: REIMAGE
* 15:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2392.codfw.wmnet with reason: REIMAGE
* 15:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2392.codfw.wmnet with reason: REIMAGE
* 14:52 volans: uploaded python3-wmflib_0.0.7 to bullseye-wikimedia
* 14:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2390.codfw.wmnet with reason: REIMAGE
* 14:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2390.codfw.wmnet with reason: REIMAGE
* 14:22 effie: disable puppet on mw* canaries, rolling depool and pooling of canaries
* 14:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
* 14:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
* 14:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2389.codfw.wmnet with reason: REIMAGE
* 13:59 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2389.codfw.wmnet with reason: REIMAGE
* 13:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2388.codfw.wmnet with reason: REIMAGE
* 13:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2388.codfw.wmnet with reason: REIMAGE
* 13:24 ema: cp3054: reboot with Linux 4.19.181+1 -- the kernel was not upgraded earlier during [[phab:T273278|T273278]] reboots due to broken dpkg status
* 13:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
* 13:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
* 12:59 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 12:53 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 12:51 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 12:47 moritzm: drain ganeti1022
* 12:46 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 12:45 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 12:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
* 12:40 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 12:38 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
* 12:34 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
* 12:23 moritzm: drain ganeti1021
* 12:21 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
* 12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
* 12:15 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
* 12:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
* 11:59 Urbanecm: Start server upload of two video files (~4 GB in total) # [[phab:T278856|T278856]]
* 11:55 moritzm: drain ganeti1020
* 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
* 11:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:675993{{!}}Disable RelatedArticles on Timeless skin on German Wikipedia]] ([[phab:T278611|T278611]]) (duration: 01m 08s)
* 11:41 moritzm: drain ganeti1019
* 11:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
* 11:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
* {{safesubst:SAL entry|1=11:23 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:674820{{!}}Enable MediaSearch by default for anonymous users (duration: 01m 10s)}}
* 11:20 moritzm: drain ganeti1018
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
* 11:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
* 11:00 moritzm: drain ganeti1017
* 10:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
* 10:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
* 10:39 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2002-dev.codfw.wmnet
* 10:33 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2002-dev.codfw.wmnet
* 10:33 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2001-dev.codfw.wmnet
* 10:26 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2001-dev.codfw.wmnet
* 09:07 hashar: contint2001: compressing files with 4 parallel executions:  sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -print0{{!}}xargs -0 -P4 gzip
* 09:01 hashar: contint2001: compressing all fresnel trace--trace.json files: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -exec gzip <nowiki>{</nowiki><nowiki>}</nowiki> \+  # [[phab:T249268|T249268]]
* 08:52 moritzm: drain ganeti1011
* 08:35 moritzm: failover Ganeti master in eqiad to ganeti1009
* 08:25 moritzm: installing ldb security updates
* 08:12 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:12 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 08:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 08:09 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 07:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 07:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 07:55 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 06:37 elukey: powercycle cp1087 (no ssh, no tty via serial console) - [[phab:T278729|T278729]]
* 06:35 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
* 02:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2386.codfw.wmnet with reason: REIMAGE
* 02:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2386.codfw.wmnet with reason: REIMAGE
* 02:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2387.codfw.wmnet with reason: REIMAGE
* 02:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2387.codfw.wmnet with reason: REIMAGE
* 02:16 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
* 02:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
* 01:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2385.codfw.wmnet with reason: REIMAGE
* 01:52 Reedy: `echo "https://www.mediawiki.org/static/images/footer/poweredby_mediawiki_176x62.png" {{!}} mwscript purgeList.php --wiki=enwiki` [[phab:T268230|T268230]]
* 01:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2385.codfw.wmnet with reason: REIMAGE
* 01:51 Reedy: `echo "https://www.mediawiki.org/favicon.ico" {{!}} mwscript purgeList.php --wiki=enwiki` [[phab:T268230|T268230]]
* 01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
* 01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
* 01:24 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
* 01:22 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
* 01:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2382.codfw.wmnet with reason: REIMAGE
* 01:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2382.codfw.wmnet with reason: REIMAGE
* 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2381.codfw.wmnet with reason: REIMAGE
* 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2381.codfw.wmnet with reason: REIMAGE
* 00:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
* 00:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
* 00:32 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
* 00:30 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
* 00:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:08 legoktm: uploaded mailman3 3.2.1-1+wmf1, postorius 1.2.4-1+wmf1 to apt.wikimedia.org
* 00:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox


== 2021-03-31 ==
== 2023-05-26 ==
* 23:34 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/Wikibase/client/includes/DataAccess/Scribunto/: {{Gerrit|bfc8f55196f57e43c0abc8a16d81cb3b390ac94a}}: Eliminate another php.getSetting() from Lua code (duration: 01m 09s)
* 23:48 tzatziki: removing 2 files for legal compliance
* 23:32 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/Wikibase/client/includes/DataAccess/Scribunto/: {{Gerrit|ad564a098f9174d76ff5c95adec20064ddde7bc9}}: Eliminate another php.getSetting() from Lua code (duration: 01m 10s)
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:12 jhuneidi@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:674698{{!}}Include private folder in restricted image (T276145)]] (duration: 01m 08s)
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:668241{{!}}Use the new mediawiki logos]], part II ([[phab:T268230|T268230]]) (duration: 01m 11s)
* 20:47 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 23:03 ladsgroup@deploy1002: Synchronized static: [[gerrit:668241{{!}}Use the new mediawiki logos]], part I ([[phab:T268230|T268230]]) (duration: 01m 09s)
* 20:47 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 22:58 Urbanecm: Start server side upload for 3 files
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 22:01 Urbanecm: Server side upload of three video files ([[phab:T279011|T279011]], [[phab:T278956|T278956]], [[phab:T278955|T278955]])
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 22:01 eileen: civicrm revision changed from {{Gerrit|2fcea570bd}} to {{Gerrit|740e49d868}}, config revision is {{Gerrit|6779e3829a}}
* 19:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 20:16 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 20:00 dwisehaupt: shifted payments2003 to use gtid for mysql replication.
* 19:15 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 19:55 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:15 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 19:21 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.37 refs [[phab:T278343|T278343]] (duration: 01m 08s)
* 18:26 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 19:20 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 17:38 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 10s)
* 19:18 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:31 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10 refs [[phab:T330216|T330216]]
* 19:13 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 16:37 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard2003.codfw.wmnet with OS bookworm
* 19:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 16:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard1003.eqiad.wmnet with OS bookworm
* 19:03 twentyafterfour@deploy1002: Synchronized php-1.36.0-wmf.37/includes/Revision/RevisionRecord.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/675875 to unblock train refs  [[phab:T278376|T278376]] [[phab:T278343|T278343]] (duration: 00m 58s)
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36  refs [[phab:T278343|T278343]]
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 17:49 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 15:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 17:41 twentyafterfour: The train is now unblocked, promoting to group0 refs [[phab:T278343|T278343]]
* 15:50 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 17:01 Urbanecm: Server side upload of three video files ([[phab:T278959|T278959]], [[phab:T278958|T278958]], [[phab:T278957|T278957]])
* 15:41 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
* 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:40 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
* 14:57 papaul: disconnecting ps1-d8-codfw for replacement
* 15:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1007.eqiad.wmnet
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:02 Urbanecm: Server side upload of two video files ([[phab:T278961|T278961]], [[phab:T278960|T278960]])
* 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 13:48 jynus: retrying s3 snapshot on codfw
* 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 13:39 akosiaris: revert mw1412, mw1413, wtp1032, mw2305 to the previous state for [[phab:T278220|T278220]]
* 15:31 nskaggs@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
* 13:34 akosiaris: disabling puppet on role::mediawiki::appserver, role::mediawiki::appserver::api, role::mediawiki::maintenance, role::mediawiki::jobrunner, role::parsoid, role::parsoid::testing [[phab:T278220|T278220]]
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 13:00 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters. The video transcoding backlog has been served we can return to "normal"
* 15:08 nskaggs@cumin1001: START - Cookbook sre.wikireplicas.update-views
* 12:59 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters
* 14:26 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=videoscaler,dc=eqiad,name=parse.*
* 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=parse.*
* 12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 11:38 awight: EU deployment complete
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 11:38 awight@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo: Backport: [[gerrit:675882{{!}}Style change to mediasearch logged-in notice close (T274927)]] [[gerrit:675883{{!}}Suppress user notice on mobile (T274927)]] [[gerrit:675881{{!}}Reset namespace filter on cancel (T276261)]] (duration: 01m 08s)
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1003.eqiad.wmnet
* 11:26 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:675509{{!}}vector: Disable WVUI search widget treatment A/B test (T276917)]] (duration: 01m 08s)
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 10:48 effie: enable puppet on all mw* servers
* 14:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 10:10 effie: disable puppet on all mw* hosts
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard1003.eqiad.wmnet on all recursors
* 09:03 hashar: contint2001: enable puppet again
* 14:06 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard1003.eqiad.wmnet on all recursors
* 08:38 hashar: contint2001: stopping Puppet for an Apache config live hack
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 04:35 eileen: civicrm revision changed from {{Gerrit|7040b68c11}} to {{Gerrit|2fcea570bd}}, config revision is {{Gerrit|6779e3829a}}
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 02:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 02:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2003.codfw.wmnet
* 02:22 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 02:17 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:03 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 02:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
* 14:02 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 02:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:02 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetboard1003.eqiad.wmnet
* 02:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard2003.codfw.wmnet on all recursors
* 02:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:02 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard2003.codfw.wmnet on all recursors
* 01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 01:15 urbanecm@deploy1002: Synchronized wmf-config/config/gawiki.yaml: {{Gerrit|3283ae59f25f02966a81ed2f0b51b964f733cf65}}: Enable local uploads on Irish Wikipedia ([[phab:T277723|T277723]]) (duration: 01m 08s)
* 14:01 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 01:13 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: {{Gerrit|3283ae59f25f02966a81ed2f0b51b964f733cf65}}: Enable local uploads on Irish Wikipedia ([[phab:T277723|T277723]]) (duration: 01m 08s)
* 13:58 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 01:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
* 13:58 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetboard2003.codfw.wmnet
* 01:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:56 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 13:56 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb1003.eqiad.wmnet
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:51 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 13:46 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 13:46 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 13:45 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 13:45 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetdb1003.eqiad.wmnet
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 13:12 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 13:06 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 12:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 12:41 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 12:39 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 12:21 hashar@deploy1002: Finished deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 12:21 hashar@deploy1002: Started deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]]
* 11:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
* 11:35 hashar@deploy1002: Finished deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 11:35 hashar@deploy1002: Started deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]]
* 10:54 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 10:54 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 10:38 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
* 10:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
* 09:54 effie: pool parse1013-parse1016 to the jobrunner cluster  - [[phab:T329366|T329366]]
* 09:29 jbond: disable puppet fleet wide to deploy minor puppet change https://gerrit.wikimedia.org/r/c/operations/puppet/+/923353
* 09:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1016.eqiad.wmnet with OS buster
* 09:26 effie: parse1013-parse1016 have neen depooled and removed from the parsoid-php service - [[phab:T329366|T329366]]
* 09:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1014.eqiad.wmnet with OS buster
* 09:24 jnuche@deploy1002: Installation of scap version "4.52.3" completed for 596 hosts
* 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1013.eqiad.wmnet with OS buster
* 09:23 jnuche@deploy1002: Installing scap version "4.52.3" for 596 hosts
* 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 09:13 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 09:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parse1015.eqiad.wmnet with OS buster
* 08:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 08:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 08:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 08:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1016.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1015.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1014.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1013.eqiad.wmnet with OS buster
* 08:10 jiji@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse101[3-6].eqiad.wmnet
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48591 and previous config saved to /var/cache/conftool/dbconfig/20230526-075903-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48590 and previous config saved to /var/cache/conftool/dbconfig/20230526-075809-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48589 and previous config saved to /var/cache/conftool/dbconfig/20230526-074358-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48588 and previous config saved to /var/cache/conftool/dbconfig/20230526-074304-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48587 and previous config saved to /var/cache/conftool/dbconfig/20230526-072854-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48586 and previous config saved to /var/cache/conftool/dbconfig/20230526-072759-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48585 and previous config saved to /var/cache/conftool/dbconfig/20230526-071349-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48584 and previous config saved to /var/cache/conftool/dbconfig/20230526-071255-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48583 and previous config saved to /var/cache/conftool/dbconfig/20230526-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48582 and previous config saved to /var/cache/conftool/dbconfig/20230526-065750-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48581 and previous config saved to /var/cache/conftool/dbconfig/20230526-064340-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48580 and previous config saved to /var/cache/conftool/dbconfig/20230526-064245-root.json
* 06:42 elukey: `apt-get clean` on stat1008 to clean up some space in the root partition
* 06:36 elukey: `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48579 and previous config saved to /var/cache/conftool/dbconfig/20230526-062835-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48578 and previous config saved to /var/cache/conftool/dbconfig/20230526-062741-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48577 and previous config saved to /var/cache/conftool/dbconfig/20230526-061330-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48576 and previous config saved to /var/cache/conftool/dbconfig/20230526-061236-root.json
* 03:51 fab@deploy1002: Finished deploy [airflow-dags/research@77cf676]: (no justification provided) (duration: 00m 17s)
* 03:51 fab@deploy1002: Started deploy [airflow-dags/research@77cf676]: (no justification provided)


== 2021-03-30 ==
== 2023-05-25 ==
* 23:59 Trey314159: reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T274200|T274200]])
* 22:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] (duration: 09m 14s)
* 23:56 legoktm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default ([[phab:T278867|T278867]]) (duration: 01m 08s)
* 22:07 zabe@deploy1002: zabe and ladsgroup: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 23:53 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default ([[phab:T278867|T278867]]) (duration: 01m 08s)
* 22:05 zabe@deploy1002: Started scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]]
* 23:29 Amir1: sudo django-admin hyperkitty_import -l discovery-alerts@lists-next.wikimedia.org discovery-alerts.mbox/discovery-alerts.mbox --pythonpath /usr/share/mailman3-web --settings settings ([[phab:T278609|T278609]])
* 21:26 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) (duration: 00m 08s)
* 23:27 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:25 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@77cf676]: (no justification provided)
* 23:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:47 TheresNoTime: close UTC late backport
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef306a35464f295f43b874301cf0170edcfa4d8c}}: Growth features: bnwiki: Enable impact module ([[phab:T274793|T274793]]) (duration: 01m 07s)
* 20:47 samtar@deploy1002: Finished scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] (duration: 08m 34s)
* 22:52 cstone: civicrm revision changed from {{Gerrit|ad430721f6}} to {{Gerrit|7040b68c11}}
* 20:40 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:11 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: rollback (duration: 00m 12s)
* 20:38 samtar@deploy1002: Started scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]]
* 21:11 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: rollback
* 20:32 samtar@deploy1002: Finished scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] (duration: 10m 58s)
* 21:05 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: trying again with newly built zip (duration: 00m 12s)
* 20:22 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 21:05 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: trying again with newly built zip
* 20:21 samtar@deploy1002: Started scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]]
* 21:02 legoktm: scap pulling on mw1298
* 20:13 samtar@deploy1002: Finished scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] (duration: 08m 31s)
* 20:59 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 15s)
* 20:06 samtar@deploy1002: samtar and daimona: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:58 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 20:05 samtar@deploy1002: Started scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]]
* 20:58 legoktm: killed remaining ffmpeg on mw1298
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:56 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 12s)
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 20:56 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 19:31 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 20:53 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:29 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 20:52 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48575 and previous config saved to /var/cache/conftool/dbconfig/20230525-190946-root.json
* 20:41 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 20s)
* 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48574 and previous config saved to /var/cache/conftool/dbconfig/20230525-190859-root.json
* 20:41 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48573 and previous config saved to /var/cache/conftool/dbconfig/20230525-185441-root.json
* 20:41 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48572 and previous config saved to /var/cache/conftool/dbconfig/20230525-185354-root.json
* 20:40 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:43 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@6b27584]: (no justification provided) (duration: 00m 19s)
* 20:38 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:43 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@6b27584]: (no justification provided)
* 20:37 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48571 and previous config saved to /var/cache/conftool/dbconfig/20230525-183937-root.json
* 20:37 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 31s)
* 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48570 and previous config saved to /var/cache/conftool/dbconfig/20230525-183849-root.json
* 20:36 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48568 and previous config saved to /var/cache/conftool/dbconfig/20230525-182432-root.json
* 20:35 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 05s)
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48567 and previous config saved to /var/cache/conftool/dbconfig/20230525-182345-root.json
* 20:35 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48566 and previous config saved to /var/cache/conftool/dbconfig/20230525-180927-root.json
* 20:34 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48565 and previous config saved to /var/cache/conftool/dbconfig/20230525-180840-root.json
* 20:34 twentyafterfour@deploy1002: Started restart [releng/phatality@715d809]: (no justification provided)
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48564 and previous config saved to /var/cache/conftool/dbconfig/20230525-175423-root.json
* 20:33 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]] (duration: 80m 32s)
* 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48563 and previous config saved to /var/cache/conftool/dbconfig/20230525-175335-root.json
* 20:29 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 49s)
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48562 and previous config saved to /var/cache/conftool/dbconfig/20230525-173918-root.json
* 20:29 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48561 and previous config saved to /var/cache/conftool/dbconfig/20230525-173831-root.json
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1307.eqiad.wmnet
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1306.eqiad.wmnet
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1305.eqiad.wmnet
* 17:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1304.eqiad.wmnet
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48559 and previous config saved to /var/cache/conftool/dbconfig/20230525-172413-root.json
* 20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1303.eqiad.wmnet
* 17:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1307.eqiad.wmnet
* 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48558 and previous config saved to /var/cache/conftool/dbconfig/20230525-172326-root.json
* 20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1306.eqiad.wmnet
* 17:15 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1305.eqiad.wmnet
* 17:14 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1304.eqiad.wmnet
* 17:14 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1303.eqiad.wmnet
* 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 20:26 twentyafterfour: preparing to deploy phatality upgrade to kibana cluster
* 17:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1296.eqiad.wmnet
* 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1298.eqiad.wmnet
* 17:09 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1299.eqiad.wmnet
* 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 20:21 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a] (duration: 04m 29s)
* 17:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1299.eqiad.wmnet
* 17:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1298.eqiad.wmnet
* 17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1296.eqiad.wmnet
* 17:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a]
* 16:39 topranks: adding outbound shaper config on eqsin to codfw transport cct ([[phab:T328313|T328313]])
* 20:16 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a] (duration: 00m 07s)
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48557 and previous config saved to /var/cache/conftool/dbconfig/20230525-163657-ladsgroup.json
* 20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a]
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48556 and previous config saved to /var/cache/conftool/dbconfig/20230525-162151-ladsgroup.json
* 20:15 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a] (duration: 17m 11s)
* 16:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 20:02 twentyafterfour: when syncing 1.36.0-wmf.37 promote to testwikis, one server failed: server mw1298.eqiad.wmnet and two more appear to be hung because scap is stuck at 2 left 99% without making any progress for a long time now. refs [[phab:T278343|T278343]]
* 16:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 19:58 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
* 16:14 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 19:58 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a]
* 16:14 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 19:58 bblack: repool cp1087 - [[phab:T278729|T278729]]
* 16:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* 19:13 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.37  refs [[phab:T278343|T278343]]
* 16:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* 18:15 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 18:09 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 17:22 legoktm: moved mw[1293-1295] to jobrunners and mw[1300-1302] to videoscalers
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48555 and previous config saved to /var/cache/conftool/dbconfig/20230525-160645-ladsgroup.json
* 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1302.eqiad.wmnet
* 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
* 17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1301.eqiad.wmnet
* 15:57 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1300.eqiad.wmnet
* 15:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1302.eqiad.wmnet
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48553 and previous config saved to /var/cache/conftool/dbconfig/20230525-155139-ladsgroup.json
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1301.eqiad.wmnet
* 15:49 dancy@deploy1002: Finished deploy [integration/docroot@dac2b70]: Updated Scap URLs (duration: 00m 07s)
* 17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1300.eqiad.wmnet
* 15:49 dancy@deploy1002: Started deploy [integration/docroot@dac2b70]: Updated Scap URLs
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1295.eqiad.wmnet
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154927-ladsgroup.json
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1294.eqiad.wmnet
* 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1293.eqiad.wmnet
* 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 17:19 legoktm: killed all ffmpeg on mw1294
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154906-ladsgroup.json
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1295.eqiad.wmnet
* 15:44 dancy: dancy@deploy1002 Updated scap URLs on doc.wikimedia.org
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1293.eqiad.wmnet
* 15:43 dancy@deploy1002: Finished deploy [integration/docroot@78e6f40]: (no justification provided) (duration: 00m 10s)
* 17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1294.eqiad.wmnet
* 15:43 dancy@deploy1002: Started deploy [integration/docroot@78e6f40]: (no justification provided)
* 17:13 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48552 and previous config saved to /var/cache/conftool/dbconfig/20230525-153359-ladsgroup.json
* 17:12 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 15:33 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 17:05 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
* 16:40 effie: enable puppet on mw* hosts
* 15:27 kartik@deploy1002: Finished scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 07m 01s)
* 16:10 mutante: mw1296 - started ferm
* 15:22 kartik@deploy1002: kartik: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 16:10 mutante: mw1308 - started ferm
* 15:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 16:07 akosiaris: split jobrunners/videoscalers clusters in conftool. mw12* become videoscalers, mw13* become jobrunners, killing ffmpeg on mw13*
* 15:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 16:07 mutante: mw1309 - systemctl start ferm
* 15:20 kartik@deploy1002: Started scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 16:07 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=jobrunner,name=mw12.*
* 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48551 and previous config saved to /var/cache/conftool/dbconfig/20230525-151853-ladsgroup.json
* 16:06 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw13.*
* 15:18 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 68m 07s)
* 16:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
* 15:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
* 15:59 akosiaris: depool a number of hosts from videoscalers
* 15:10 topranks: Migrating cr1-eqiad downlink to row E/F from lsw1-e1-eqiad et-0/0/48 to ssw1-e1-eqiad et-0/0/31
* 15:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
* 15:10 mutante: gerrit-replica.wikimedia.org - gerrit2002 - reimaging - scheduled maintenance
* 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1308.eqiad.wmnet,service=jobrunner
* 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet,service=jobrunner
* 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
* 15:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:29 hnowlan: moving all test tables out of cassandra directories on aqs hosts
* 15:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 14:59 effie: disable puppet on mediawiki servers to deploy 663565
* 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48550 and previous config saved to /var/cache/conftool/dbconfig/20230525-150347-ladsgroup.json
* 14:58 Urbanecm: Move Help talk:Help talk:Getting started --> Help talk:Getting started via moveBatch.php on enwiki ([[phab:T278350|T278350]])
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48549 and previous config saved to /var/cache/conftool/dbconfig/20230525-145857-ladsgroup.json
* 14:32 arturo: manually start update-openstack-mirror.service on sodium ([[phab:T278505|T278505]])
* 14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 13:02 jbond42: rollout lxml update [[phab:T278822|T278822]]
* 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 12:55 jbond42: update spamassasin on lists,otrs and mx [[phab:T278820|T278820]]
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48548 and previous config saved to /var/cache/conftool/dbconfig/20230525-145836-ladsgroup.json
* 12:39 Amir1: ssh -p 29418 gerrit.wikimedia.org replication start wikidata/query-builder --wait ([[phab:T277060|T277060]])
* 14:54 marostegui: Wikireplicas are lagging behind for the following sections: s1, s2, s5, s7 [[phab:T337446|T337446]]
* 12:38 jbond42: update python(3)-pygments
* 14:54 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 12:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48547 and previous config saved to /var/cache/conftool/dbconfig/20230525-144330-ladsgroup.json
* 12:14 Urbanecm: mwmaint1002: Downloading multiple big files (total filesize estimated 150 GB, downloaded and processed in batches) for server-side uploads
* 14:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
* 11:21 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:675751{{!}}Disable legacy javascript global variables in group1]], Some increase in client errors is expected ([[phab:T72470|T72470]]) (duration: 01m 11s)
* 14:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1026']
* 09:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
* 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1027']
* 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027']
* 09:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 09:41 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1025']
* 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1024']
* 09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48546 and previous config saved to /var/cache/conftool/dbconfig/20230525-142824-ladsgroup.json
* 09:05 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1025']
* 09:04 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1024']
* 08:36 jynus: mariadb upgrade of all buster source backup hosts to 10.4.18 [[phab:T250666|T250666]]
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 08:05 dcausse: refreshing wdqs entities ([[phab:T278693|T278693]])
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1022']
* 07:37 elukey: restart-php7.2-fpm on mw1304, jobrunner completely overwhelmed by ffmpeg/transcode jobs (not publishing metrics, erroring out for memcached timeouts) - [[phab:T278734|T278734]]
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 07:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36 - [[phab:T274940|T274940]]
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 06:06 elukey: powercycle cp1087 (no ssh, no mgmt console tty)
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 06:04 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
* 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver
* 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
* 14:20 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 14:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=parsoid-php,dc=eqiad
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48545 and previous config saved to /var/cache/conftool/dbconfig/20230525-141318-ladsgroup.json
* 14:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:11 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:10 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48544 and previous config saved to /var/cache/conftool/dbconfig/20230525-140822-ladsgroup.json
* 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 15m 56s)
* 13:53 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:52 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 13:46 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]] (duration: 07m 42s)
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:38 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]]
* 13:28 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] (duration: 09m 06s)
* 13:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:20 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:19 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]]
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s5, s2, s7', diff saved to https://phabricator.wikimedia.org/P48538 and previous config saved to /var/cache/conftool/dbconfig/20230525-121012-root.json
* 11:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48537 and previous config saved to /var/cache/conftool/dbconfig/20230525-113914-root.json
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48536 and previous config saved to /var/cache/conftool/dbconfig/20230525-112409-root.json
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:21 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:20 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:15 jbond: update udplog on mwlog server
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48535 and previous config saved to /var/cache/conftool/dbconfig/20230525-110948-root.json
* 11:09 jbond: upload udplog_1.10_amd64.deb
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48534 and previous config saved to /var/cache/conftool/dbconfig/20230525-110905-root.json
* 11:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 11:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 10:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48533 and previous config saved to /var/cache/conftool/dbconfig/20230525-105443-root.json
* 10:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
* 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
* 10:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48532 and previous config saved to /var/cache/conftool/dbconfig/20230525-105400-root.json
* 10:53 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
* 10:52 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:48 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2005-dev.wikimedia.org
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48531 and previous config saved to /var/cache/conftool/dbconfig/20230525-103939-root.json
* 10:39 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48530 and previous config saved to /var/cache/conftool/dbconfig/20230525-103855-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48529 and previous config saved to /var/cache/conftool/dbconfig/20230525-103445-root.json
* 10:32 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 10:24 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2005-dev.wikimedia.org
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48528 and previous config saved to /var/cache/conftool/dbconfig/20230525-102434-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48527 and previous config saved to /var/cache/conftool/dbconfig/20230525-102351-root.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48526 and previous config saved to /var/cache/conftool/dbconfig/20230525-101940-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48525 and previous config saved to /var/cache/conftool/dbconfig/20230525-100927-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48524 and previous config saved to /var/cache/conftool/dbconfig/20230525-100846-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48523 and previous config saved to /var/cache/conftool/dbconfig/20230525-100436-root.json
* 10:00 kart_: Updated cxserver to 2023-05-25-093623-production (config: language pairs transform fix + [[phab:T331201|T331201]])
* 09:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 09:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48522 and previous config saved to /var/cache/conftool/dbconfig/20230525-095423-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48521 and previous config saved to /var/cache/conftool/dbconfig/20230525-095341-root.json
* 09:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 09:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48520 and previous config saved to /var/cache/conftool/dbconfig/20230525-094931-root.json
* 09:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 09:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48519 and previous config saved to /var/cache/conftool/dbconfig/20230525-093918-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48518 and previous config saved to /var/cache/conftool/dbconfig/20230525-093426-root.json
* 09:32 apergos: running from dumpsdata1004 via ariel login screen session, as root, rsync with bwlimit 100000  to dumpsdata1006, copying all public xml dumps data
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48517 and previous config saved to /var/cache/conftool/dbconfig/20230525-092413-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48516 and previous config saved to /var/cache/conftool/dbconfig/20230525-091922-root.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2179', diff saved to https://phabricator.wikimedia.org/P48515 and previous config saved to /var/cache/conftool/dbconfig/20230525-091132-root.json
* 09:10 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48514 and previous config saved to /var/cache/conftool/dbconfig/20230525-090417-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48513 and previous config saved to /var/cache/conftool/dbconfig/20230525-084912-root.json
* 08:32 elukey: revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - [[phab:T337248|T337248]]
* 07:52 matthiasmullie: UTC morning backports done
* 07:51 mlitn@deploy1002: Finished scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] (duration: 16m 12s)
* 07:37 mlitn@deploy1002: mlitn: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:35 mlitn@deploy1002: Started scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]]
* 07:18 mlitn@deploy1002: Finished scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] (duration: 14m 02s)
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P48511 and previous config saved to /var/cache/conftool/dbconfig/20230525-071719-root.json
* 07:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 07:06 mlitn@deploy1002: mlitn: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:04 mlitn@deploy1002: Started scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]]
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1196', diff saved to https://phabricator.wikimedia.org/P48509 and previous config saved to /var/cache/conftool/dbconfig/20230525-064418-root.json
* 06:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P48506 and previous config saved to /var/cache/conftool/dbconfig/20230525-055734-root.json
* 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P48504 and previous config saved to /var/cache/conftool/dbconfig/20230525-055236-root.json
* 05:48 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 05:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110', diff saved to https://phabricator.wikimedia.org/P48503 and previous config saved to /var/cache/conftool/dbconfig/20230525-051923-root.json
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}


== 2021-03-29 ==
== 2023-05-24 ==
* 19:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
* 21:18 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] (duration: 09m 40s)
* 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:10 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:08 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]]
* 16:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
* 20:55 samtar@deploy1002: Finished scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 15s)
* 16:11 hnowlan: depooled aqs1004 for transfer of large tables to aqs1010
* 20:48 samtar@deploy1002: samtar: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:47 samtar@deploy1002: Started scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 15:47 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 31s)
* 15:45 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:18 samtar@deploy1002: samtar: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:39 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 13:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
* 20:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
* 20:08 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:03 ema: cp4027: rollback luajit experiment https://github.com/apache/trafficserver/issues/7423#issuecomment-809354214
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:36 ema: cp4027: re-enable JIT compilation in all ats-be lua scripts -- https://github.com/apache/trafficserver/issues/7423
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:57 ema: cp4027: re-enable JIT compilation in normalize-path.lua -- https://github.com/apache/trafficserver/issues/7423
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:32 ema: cp4027: install libluajit 2.1.0~beta3+dfsg-6wm1 with P15083 applied -- https://github.com/apache/trafficserver/issues/7423
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:16 ryankemper: [[phab:T267927|T267927]] `sudo -i cookbook sre.wdqs.data-reload wdqs2008.codfw.wmnet --task-id [[phab:T267927|T267927]] --reload-data wikidata --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --reuse-downloaded-dump --depool`
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:15 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:47 filippo@deploy1002: Finished deploy [librenms/librenms@df69efe]: deploy {{Gerrit|I156f32925f693}} (duration: 00m 08s)
* 19:12 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 08:47 filippo@deploy1002: Started deploy [librenms/librenms@df69efe]: deploy {{Gerrit|I156f32925f693}}
* 19:06 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]]
* 07:59 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 06s)
* 18:55 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 07:58 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 18:49 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 07:54 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: Wrap most of functionalities depending on protect mode in a condition - [[phab:T278478|T278478]] (duration: 01m 08s)
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 07:49 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: [[gerrit:675161{{!}}Wrap most of functionalities depending on protect mode in a condition]] ([[phab:T278478|T278478]]) (duration: 01m 08s)
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 07:42 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]] [[phab:T268435|T268435]]
* 18:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:22 ejegg: civicrm upgraded from {{Gerrit|4251dfa1}} to {{Gerrit|b8cab6f6}}
* 16:54 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance (duration: 00m 09s)
* 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance
* 16:05 elukey: move kafka mirror on kafka main brokers to PKI - [[phab:T337248|T337248]]
* 16:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] (duration: 08m 33s)
* 15:56 elukey: move kafka mirror on kafka jumbo brokers to PKI - [[phab:T337248|T337248]]
* 15:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 15:52 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]]
* 15:47 ejegg: payments-wiki upgraded from {{Gerrit|e02bc7c5}} to {{Gerrit|c2f9f8b5}}
* 15:39 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363] (duration: 01m 35s)
* 15:38 ejegg: standalone SmashPig upgraded from {{Gerrit|5460dbe2}} to {{Gerrit|db23b998}}
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363]
* 15:37 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363] (duration: 00m 04s)
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363]
* 15:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:32 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:31 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:31 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363] (duration: 06m 13s)
* 15:31 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:30 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:25 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:25 aqu@deploy1002: Started deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363]
* 15:24 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:18 aqu: analytics-refinery, about to deploy
* 15:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:30 volans@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:29 volans@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:26 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 14:26 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 14:19 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] (duration: 12m 11s)
* 14:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]] (duration: 00m 07s)
* 14:13 hashar@deploy1002: Started deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]]
* 14:08 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]]
* 14:06 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] (duration: 09m 21s)
* 13:58 urbanecm@deploy1002: matmarex and urbanecm and sgimeno: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] synced t
* 13:56 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]]
* 13:55 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]] (duration: 07m 06s)
* 13:48 urbanecm@deploy1002: Started scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]]
* 13:36 samtar@deploy1002: Finished scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] (duration: 08m 04s)
* 13:29 samtar@deploy1002: samtar and wmde-fisch: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:28 samtar@deploy1002: Started scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]]
* 13:26 samtar@deploy1002: Finished scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] (duration: 10m 15s)
* 13:17 samtar@deploy1002: samtar and dcausse: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:16 samtar@deploy1002: Started scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]]
* 13:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] (duration: 07m 53s)
* 13:07 samtar@deploy1002: herron and samtar: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:07 xSavitar: tools.codesearch Deployed https://gerrit.wikimedia.org/r/c/labs/codesearch/+/909258 and also restarted tool instances to core search backend was dead.
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]]
* 12:55 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript findBadBlobs --wiki nowiki --revisions {{Gerrit|5227369}} --mark [[phab:T337392|T337392]]` [[phab:T337392|T337392]]
* 12:47 tgr_: running changeWikiConfig.php on Growth pilot wikis for [[phab:T337348|T337348]]
* 10:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes
* 09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet
* 09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet
* 09:04 dcausse@deploy1002: Finished deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset (duration: 00m 17s)
* 09:04 dcausse@deploy1002: Started deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset
* 08:52 claime: repooling mw2248.codfw.wmnet - [[phab:T334429|T334429]]
* 08:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes
* 08:50 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
* 08:49 marostegui: Stop mariadb on db1154 (sanitarium) there will be lag on clouddb* hosts
* 08:36 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]] (duration: 07m 20s)
* 08:28 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]]
* 07:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 07:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 07:41 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 07:40 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
* 05:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
* 01:19 mutante: contint2001 - jenkins started again
* 01:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 01:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:45 mutante: short maintenance on main contint server (jenkins)
* 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint1002.wikimedia.org with reason: maintenance


== 2021-03-27 ==
== 2023-05-23 ==
* 19:25 elukey: powercycle elastic1060 - [[phab:T278630|T278630]]
* 23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done
* 06:10 ryankemper: [[phab:T267927|T267927]] `sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 -O /srv/wdqs/latest-all.ttl.bz2 && sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2 -O /srv/wdqs/latest-lexemes.ttl.bz2` on `ryankemper@wdqs2008` tmux session `download_dumps_2020-03-26`
* 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 05:44 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 05:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance
* 05:42 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - [[phab:T324659|T324659]]
* 05:42 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 22:00 eileen: civicrm upgraded from {{Gerrit|11538e23}} to {{Gerrit|4251dfa1}}
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 21:26 ejegg: payments-wiki upgraded from {{Gerrit|a7567c6a}} to {{Gerrit|e02bc7c5}}
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 21:02 TheresNoTime: close UTC late backport window
* 05:38 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 21:01 samtar@deploy1002: Finished scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] (duration: 11m 47s)
* 05:38 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:51 samtar@deploy1002: ksarabia and samtar: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:50 samtar@deploy1002: Started scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]]
* 20:48 samtar@deploy1002: Finished scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] (duration: 11m 20s)
* 20:38 samtar@deploy1002: samtar: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:37 ejegg: civicrm upgraded from {{Gerrit|efe25c9b}} to {{Gerrit|11538e23}}
* 20:37 samtar@deploy1002: Started scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]]
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:42 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 19:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 19:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1027
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1027
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1026
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1026
* 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
* 19:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:31 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1025
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 19:30 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
* 19:25 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
* 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1022
* 19:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 19:24 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1022
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:29 inflatador: bking@cumin1001 rolling restart of codfw wdqs public hosts [[phab:T337327|T337327]]
* 18:26 ryankemper: [WDQS] [[phab:T337327|T337327]] Deployed new, hopefully-working rule after addressing previous syntax error (unescaped `"`). See `/srv/private` commit `6e2f5ab19427902994bb9d03d28277252f021474`
* 18:16 ryankemper: [WDQS] Rolled back requestctl rule
* 18:12 ryankemper: [WDQS] [[phab:T337327|T337327]] New rule in place to ban potential source of WDQS codfw outage. Rolling restart will be done in a couple minutes to [attempt to] restore service availability
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:03 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]] and [[phab:T333140|T333140]]
* 17:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
* 16:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:50 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]], part 2
* 16:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:42 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]]
* 16:41 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:31 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Rename page content change enrich error stream to match convention - [[phab:T336656|T336656]] (duration: 06m 58s)
* 16:22 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]] (duration: 36m 02s)
* 15:56 topranks: moving lvs1018 connection to rack E1 from lsw1-e1-eqiad to ssw1-e1-eqiad [[phab:T322937|T322937]]
* 15:46 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]]
* 15:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 sukhe: stop pybal on lvs1018: [[phab:T322937|T322937]]
* 15:38 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bullseye
* 15:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:20 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:20 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:19 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 15:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bullseye
* 15:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases1003.eqiad.wmnet with OS bullseye
* 15:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:00 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes
* 14:58 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:58 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:51 moritzm: removed imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org/buster-wikimedia now that the Thumbor spec tests have been upgraded to match latest patches
* 14:49 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:46 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:36 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases1003.eqiad.wmnet with OS bullseye
* 14:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kafkamon2002.codfw.wmnet
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases2003.codfw.wmnet
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 herron@cumin1001: START - Cookbook sre.dns.netbox
* 14:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases2003.codfw.wmnet
* 13:56 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2002.codfw.wmnet
* 13:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1002.eqiad.wmnet
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:54 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:50 herron@cumin1001: START - Cookbook sre.dns.netbox
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases1003.eqiad.wmnet
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:47 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1002.eqiad.wmnet
* 13:45 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:45 hoo@deploy1002: Finished scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] (duration: 12m 49s)
* 13:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 13:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 13:43 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:43 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases1003.eqiad.wmnet
* 13:33 hoo@deploy1002: hoo: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:32 hoo@deploy1002: Started scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]]
* 13:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:40 akosiaris@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 10:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 10:21 akosiaris: reboot rdb1011 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:21 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 10:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
* 10:07 akosiaris: reboot rdb2009 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 10:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
* 09:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48493 and previous config saved to /var/cache/conftool/dbconfig/20230523-095720-root.json
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
* 09:50 stevemunene: reboot an-test-master1002.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
* 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48492 and previous config saved to /var/cache/conftool/dbconfig/20230523-094216-root.json
* 09:42 stevemunene: reboot an-test-worker1003.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
* 09:34 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48491 and previous config saved to /var/cache/conftool/dbconfig/20230523-092711-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48490 and previous config saved to /var/cache/conftool/dbconfig/20230523-091207-root.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48489 and previous config saved to /var/cache/conftool/dbconfig/20230523-085702-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48488 and previous config saved to /var/cache/conftool/dbconfig/20230523-085246-root.json
* 08:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 08:44 hashar@deploy1002: Started deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48487 and previous config saved to /var/cache/conftool/dbconfig/20230523-084157-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48486 and previous config saved to /var/cache/conftool/dbconfig/20230523-083741-root.json
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1122.eqiad.wmnet
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:35 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1122.eqiad.wmnet
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48485 and previous config saved to /var/cache/conftool/dbconfig/20230523-082653-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48484 and previous config saved to /var/cache/conftool/dbconfig/20230523-082237-root.json
* 08:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] (duration: 08m 37s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48483 and previous config saved to /var/cache/conftool/dbconfig/20230523-081342-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48482 and previous config saved to /var/cache/conftool/dbconfig/20230523-081148-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48481 and previous config saved to /var/cache/conftool/dbconfig/20230523-080732-root.json
* 08:07 kartik@deploy1002: kartik: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:05 kartik@deploy1002: Started scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]]
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48480 and previous config saved to /var/cache/conftool/dbconfig/20230523-075227-root.json
* 07:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:51 hashar@deploy1002: Started deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:47 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] (duration: 07m 19s)
* 07:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:44 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:41 marostegui@deploy1002: marostegui: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:39 marostegui@deploy1002: Started scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]]
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48479 and previous config saved to /var/cache/conftool/dbconfig/20230523-073841-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48478 and previous config saved to /var/cache/conftool/dbconfig/20230523-073722-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48477 and previous config saved to /var/cache/conftool/dbconfig/20230523-073710-root.json
* 07:36 marostegui: Starting es5 eqiad failover from es1024 to es1023 [[phab:T337285|T337285]]
* 07:25 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] (duration: 07m 16s)
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48476 and previous config saved to /var/cache/conftool/dbconfig/20230523-072218-root.json
* 07:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]]
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 42s)
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48475 and previous config saved to /var/cache/conftool/dbconfig/20230523-070713-root.json
* 07:06 kartik@deploy1002: kartik: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48474 and previous config saved to /var/cache/conftool/dbconfig/20230523-070547-root.json
* 07:04 kartik@deploy1002: Started scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]]
* 07:00 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] (duration: 06m 58s)
* 06:54 marostegui@deploy1002: marostegui: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 06:53 marostegui@deploy1002: Started scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48473 and previous config saved to /var/cache/conftool/dbconfig/20230523-065042-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change es1020 weight', diff saved to https://phabricator.wikimedia.org/P48472 and previous config saved to /var/cache/conftool/dbconfig/20230523-064850-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48471 and previous config saved to /var/cache/conftool/dbconfig/20230523-064820-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48470 and previous config saved to /var/cache/conftool/dbconfig/20230523-064729-root.json
* 06:46 marostegui: Starting es4 eqiad failover from es1021 to es1020 - [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48469 and previous config saved to /var/cache/conftool/dbconfig/20230523-063836-root.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48468 and previous config saved to /var/cache/conftool/dbconfig/20230523-063538-root.json
* 06:26 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] (duration: 08m 21s)
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48467 and previous config saved to /var/cache/conftool/dbconfig/20230523-062033-root.json
* 06:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 06:18 marostegui@deploy1002: Started scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]]
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48466 and previous config saved to /var/cache/conftool/dbconfig/20230523-060528-root.json
* 06:04 kart_: cxserver: Remove Flores MT service ([[phab:T331505|T331505]])
* 06:03 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 06:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48465 and previous config saved to /var/cache/conftool/dbconfig/20230523-055024-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48464 and previous config saved to /var/cache/conftool/dbconfig/20230523-053519-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48463 and previous config saved to /var/cache/conftool/dbconfig/20230523-052014-root.json
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.8 (duration: 02m 17s)
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 49m 04s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 02:57 eileen: civicrm upgraded from {{Gerrit|3329155a}} to {{Gerrit|6642b602}}
* 02:22 eileen: civicrm upgraded from {{Gerrit|7eae24d5}} to {{Gerrit|3329155a}}


== 2021-03-26 ==
== 2023-05-22 ==
* 22:27 tzatziki: reset password for Philroc
* 23:29 eileen: civicrm upgraded from {{Gerrit|cc9593d0}} to {{Gerrit|7eae24d5}}
* 20:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 23:16 zabe@deploy1002: Finished scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] (duration: 06m 58s)
* 20:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 23:11 zabe@deploy1002: zabe: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 17:44 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/includes/changes/RecentChange.php: RecentChange: directly build the user identity if we have the data - [[phab:T277795|T277795]] (duration: 01m 06s)
* 23:09 zabe@deploy1002: Started scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]]
* 17:42 hashar@deploy1002: Finished scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]] (duration: 31m 43s)
* 21:38 sbassett: Deployed security mitigations for [[phab:T333140|T333140]] and [[phab:T336027|T336027]]
* 17:10 hashar@deploy1002: Started scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]]
* 20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1004.eqiad.wmnet
* 15:40 Urbanecm: Delete `commonswiki:ip-autoblock:whitelist` cache key from memcached (wmf.36 moves the autoblock whitelist source, and it was deployed on commonswiki for a while, resulting in the cache key being empty)
* 20:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:37 hnowlan: importing imposm3_0.11.0+git20201104.4758cf4-1_amd64.changes on apt1001
* 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
* 20:53 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
* 20:51 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
* 20:45 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1004.eqiad.wmnet
* 13:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
* 20:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1005.eqiad.wmnet
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 13:02 moritzm: reimaging theemin [[phab:T275873|T275873]]
* 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 12:56 moritzm: drain ganeti1014
* 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 12:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
* 20:33 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1005.eqiad.wmnet
* 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
* 20:27 TheresNoTime: close UTC late backport window
* 12:37 moritzm: drain ganeti1013
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] (duration: 07m 47s)
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
* 20:17 samtar@deploy1002: samtar and superpes: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]]
* 10:55 Urbanecm: Move `Help talk:Getting Started --> Help talk:Getting started` on enwiki with `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` ([[phab:T278350|T278350]])
* 20:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] (duration: 08m 22s)
* 10:49 Urbanecm: Move `User talk:TheAafi/Help talk` to `Help talk:Getting Started` via `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` to fix an UBN task ([[phab:T278350|T278350]])
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 10:10 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts chlorine.eqiad.wmnet
* 20:09 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 10:02 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts chlorine.eqiad.wmnet
* 20:08 samtar@deploy1002: superpes and samtar: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts argon.eqiad.wmnet
* 20:06 samtar@deploy1002: Started scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]]
* 09:49 filippo@deploy1002: Finished deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}} (duration: 00m 08s)
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:49 filippo@deploy1002: Started deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}}
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:46 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts argon.eqiad.wmnet
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts acrab.codfw.wmnet
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:43 moritzm: delete fermium in Ganeti (was still around, but powered down) [[phab:T224586|T224586]]
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:38 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts acrux.codfw.wmnet
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:36 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrab.codfw.wmnet
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:32 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrux.codfw.wmnet
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:31 filippo@deploy1002: Finished deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}} (duration: 00m 12s)
* 17:04 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided) (duration: 00m 17s)
* 09:31 filippo@deploy1002: Started deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}}
* 17:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided)
* 09:28 moritzm: drain ganeti1012
* 16:58 XioNoX: push mgmt_junos to all L2 switches
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2009.codfw.wmnet
* 09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
* 08:38 moritzm: drain ganeti1010
* 15:57 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
* 08:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
* 15:56 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2009.codfw.wmnet
* 08:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
* 15:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
* 06:11 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 15:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:25 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 05:06 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@bb5a072]: 0.3.68 (duration: 07m 31s)
* 15:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 05:00 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.68` on canary `wdqs1003`; proceeding to rest of fleet
* 14:32 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@bb5a072]: 0.3.68
* 14:31 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 04:58 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.68`. Pre-deploy tests passing on canary `wdqs1003`
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:10 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor2003.codfw.wmnet with OS bookworm
* 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 12:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor2003.codfw.wmnet with OS bookworm
* 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor1003.eqiad.wmnet with OS bookworm
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P48456 and previous config saved to /var/cache/conftool/dbconfig/20230522-115936-root.json
* 11:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor1003.eqiad.wmnet with OS bookworm
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path [[phab:T337220|T337220]]
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path
* 10:06 hashar@deploy1002: Finished scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] (duration: 37m 00s)
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor2003.codfw.wmnet
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor2003.codfw.wmnet on all recursors
* 10:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor2003.codfw.wmnet on all recursors
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 10:02 moritzm: installing updated usb.ids packages for Bullseye
* 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor2003.codfw.wmnet
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor1003.eqiad.wmnet
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor1003.eqiad.wmnet on all recursors
* 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor1003.eqiad.wmnet on all recursors
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor1003.eqiad.wmnet
* 09:39 hashar@deploy1002: hashar: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:29 hashar@deploy1002: Started scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]]
* 08:46 marostegui: Stop mysql on db2160 (haproxy irc alerts will be generated)
* 08:28 elukey: drain Arelion link between cr1-codfw and cr3-eqsin to mitigate packet loss eqiad <-> eqsin
* 08:22 moritzm: installing systemd security updates
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48455 and previous config saved to /var/cache/conftool/dbconfig/20230522-081724-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48454 and previous config saved to /var/cache/conftool/dbconfig/20230522-080219-root.json
* 07:59 elukey: restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48453 and previous config saved to /var/cache/conftool/dbconfig/20230522-075613-root.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48452 and previous config saved to /var/cache/conftool/dbconfig/20230522-074715-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48451 and previous config saved to /var/cache/conftool/dbconfig/20230522-074109-root.json
* 07:37 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48450 and previous config saved to /var/cache/conftool/dbconfig/20230522-073210-root.json
* 07:28 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48449 and previous config saved to /var/cache/conftool/dbconfig/20230522-072604-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48448 and previous config saved to /var/cache/conftool/dbconfig/20230522-071705-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48447 and previous config saved to /var/cache/conftool/dbconfig/20230522-071333-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48446 and previous config saved to /var/cache/conftool/dbconfig/20230522-071326-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48445 and previous config saved to /var/cache/conftool/dbconfig/20230522-071319-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48444 and previous config saved to /var/cache/conftool/dbconfig/20230522-071059-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48443 and previous config saved to /var/cache/conftool/dbconfig/20230522-070200-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48442 and previous config saved to /var/cache/conftool/dbconfig/20230522-065828-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48441 and previous config saved to /var/cache/conftool/dbconfig/20230522-065822-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48440 and previous config saved to /var/cache/conftool/dbconfig/20230522-065815-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48439 and previous config saved to /var/cache/conftool/dbconfig/20230522-065555-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48438 and previous config saved to /var/cache/conftool/dbconfig/20230522-064656-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48437 and previous config saved to /var/cache/conftool/dbconfig/20230522-064541-root.json
* 06:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
* 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48436 and previous config saved to /var/cache/conftool/dbconfig/20230522-064323-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48435 and previous config saved to /var/cache/conftool/dbconfig/20230522-064317-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48434 and previous config saved to /var/cache/conftool/dbconfig/20230522-064310-root.json
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1121.eqiad.wmnet
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48433 and previous config saved to /var/cache/conftool/dbconfig/20230522-064050-root.json
* 06:40 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1121.eqiad.wmnet
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48432 and previous config saved to /var/cache/conftool/dbconfig/20230522-063151-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48431 and previous config saved to /var/cache/conftool/dbconfig/20230522-062818-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48430 and previous config saved to /var/cache/conftool/dbconfig/20230522-062812-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48429 and previous config saved to /var/cache/conftool/dbconfig/20230522-062805-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48428 and previous config saved to /var/cache/conftool/dbconfig/20230522-062545-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight to es2024', diff saved to https://phabricator.wikimedia.org/P48427 and previous config saved to /var/cache/conftool/dbconfig/20230522-061947-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48426 and previous config saved to /var/cache/conftool/dbconfig/20230522-061925-root.json
* 06:17 marostegui: Starting es5 codfw failover from es2023 to es2024 - [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48425 and previous config saved to /var/cache/conftool/dbconfig/20230522-061524-root.json
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48424 and previous config saved to /var/cache/conftool/dbconfig/20230522-061314-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48423 and previous config saved to /var/cache/conftool/dbconfig/20230522-061307-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48422 and previous config saved to /var/cache/conftool/dbconfig/20230522-061300-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48421 and previous config saved to /var/cache/conftool/dbconfig/20230522-061040-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2021', diff saved to https://phabricator.wikimedia.org/P48420 and previous config saved to /var/cache/conftool/dbconfig/20230522-061033-marostegui.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48419 and previous config saved to /var/cache/conftool/dbconfig/20230522-055809-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48418 and previous config saved to /var/cache/conftool/dbconfig/20230522-055803-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48417 and previous config saved to /var/cache/conftool/dbconfig/20230522-055756-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48416 and previous config saved to /var/cache/conftool/dbconfig/20230522-055120-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48415 and previous config saved to /var/cache/conftool/dbconfig/20230522-054304-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48414 and previous config saved to /var/cache/conftool/dbconfig/20230522-054258-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48413 and previous config saved to /var/cache/conftool/dbconfig/20230522-054251-root.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48412 and previous config saved to /var/cache/conftool/dbconfig/20230522-053705-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 codfw primaryT337203', diff saved to https://phabricator.wikimedia.org/P48411 and previous config saved to /var/cache/conftool/dbconfig/20230522-053554-marostegui.json
* 05:34 marostegui: Starting es4 codfw failover from es2021 to es2020 - [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48410 and previous config saved to /var/cache/conftool/dbconfig/20230522-052938-root.json
* 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48409 and previous config saved to /var/cache/conftool/dbconfig/20230522-052800-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48408 and previous config saved to /var/cache/conftool/dbconfig/20230522-052753-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48407 and previous config saved to /var/cache/conftool/dbconfig/20230522-052746-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029, es1030, es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48406 and previous config saved to /var/cache/conftool/dbconfig/20230522-051957-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Failover es1, es2 and es3 masters for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48405 and previous config saved to /var/cache/conftool/dbconfig/20230522-051723-marostegui.json


== 2021-03-25 ==
== 2023-05-21 ==
* 23:47 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/3D/package.json: No-op demo sync (duration: 01m 07s)
* 07:45 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 23:37 stran@deploy1002: Synchronized README: (no justification provided) (duration: 01m 06s)
* 07:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 23:20 jhuneidi@deploy1002: Synchronized README: [[gerrit:674984{{!}}DEMO: README]] (duration: 01m 07s)
* 07:43 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 22:59 brennen: no patches for upcoming deploy window, but we'll be conducting a deployment training using DEMO patches to READMEs.
* 07:42 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 22:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
* 07:41 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 07:40 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:27 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 19:48 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 and 2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 19:37 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 19:36 hashar@deploy1002: sync-wikiversions aborted: (no justification provided) (duration: 00m 03s)
* 19:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 08s)
* 19:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 06s)
* 18:59 Urbanecm: `mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' ` finished ([[phab:T275337|T275337]])
* 18:53 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sturm . # [[phab:T278391|T278391]]
* 18:50 Urbanecm: mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' # [[phab:T275337|T275337]]
* 18:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|39cd4f15a3900783ac0e9a213004a28f18298a23}}: ruwiki: flaggedrevs: Do not allow sysops to modify users in autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 09s)
* 18:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcfb7feaace1f397169e5e1bab7efd4e5f605a0f}}: ruwiki: flaggedrevs: Do not remove autoreview group ([[phab:T275337|T275337]]) (duration: 01m 14s)
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|3fb664682bea3c4d1448b0937f938e810268bac3}}: ruwiki: flaggedrevs: Revoke review from sysop group ([[phab:T275811|T275811]]) (duration: 01m 06s)
* 18:29 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (3/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (2/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 18:26 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (1/3; [[phab:T275819|T275819]]) (duration: 01m 10s)
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|62be4e738a4fd45256027bb09b010ab152f19850}}: Disable magic links on enwiki ([[phab:T275951|T275951]]) (duration: 01m 20s)
* 18:14 mutante: alert1001 - sudo systemctl restart tcpircbot-logmsgbot
* 18:09 marxarelli: scap sync-file .pipeline Config: [[gerrit:674132{{!}}Include patches in restricted image (T271274)]]
* 18:06 hnowlan: draining and restarting aqs1004-b cassandra
* 17:45 hnowlan: draining and restarting aqs1004-a cassandra
* 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:14 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:08 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:39 hashar: Restarted Apache 2 on contint2001 / contint1001
* 16:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 16:32 moritzm: restarting apache on an-tool1007/turnilo
* 16:27 moritzm: restarting dnsdist/rdns-recursor on malmok
* 16:24 jbond42: restart slapd on ldap-replica
* 16:22 jbond42: restart slapd on ldap-corp
* 16:20 jbond42: restart apache on lists1002
* 16:18 jbond42: restart apache on netbox
* 16:13 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Disallow negative or decimal values in pages tag - [[phab:T278400|T278400]] (duration: 01m 32s)
* 16:12 jbond42: restart routinator on rpki*
* 16:12 moritzm: restarting nginx on apt*
* 16:10 moritzm: restarting apache on dbmonitor
* 16:08 moritzm: restart Apacge on matomo/piwik
* 16:03 jbond42: restart apache service on gerrit
* 16:02 jbond42: restart idp service
* 16:01 ema: A:cp rolling ats-<nowiki>{</nowiki>tls,backend<nowiki>}</nowiki>-restart for openssl upgrades -- https://www.openssl.org/news/secadv/20210325.txt
* 15:45 moritzm: installing openssl updates on buster
* 14:48 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:45 herron@cumin1001: START - Cookbook sre.dns.netbox
* 14:13 twentyafterfour: update phabricator again (last night's update undid a hotfix that is now fixed properly)
* 13:45 moritzm: drain ganeti1009
* 13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 13:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 13:27 moritzm: reduce webperf1001/webperf2001 to 4G RAM (xhgui has been split off to separate VMs)
* 13:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
* 13:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
* 12:52 hnowlan: aqs1004 nodetool-a cleanup finished
* 12:14 moritzm: drain ganeti1008
* 12:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1007.eqiad.wmnet
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1007.eqiad.wmnet
* 11:52 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:674861{{!}}Disable Legacy javascript in fawikiquote]] ([[phab:T72470|T72470]]) (duration: 01m 07s)
* 11:46 moritzm: drain ganeti1007
* 11:44 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/skins/Vector/resources: [[gerrit:674382{{!}}Inform anonymous A/B test by tracking time from navigationStart (T275807)]] (duration: 01m 09s)
* 11:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
* 11:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
* 11:33 ladsgroup@deploy1002: Synchronized dblists/: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]], Part II ([[phab:T278369|T278369]]) (duration: 01m 07s)
* 11:32 ladsgroup@deploy1002: Synchronized wmf-config: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]] ([[phab:T278369|T278369]]) (duration: 01m 30s)
* 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 11:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
* 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 11:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
* 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 11:10 moritzm: drain ganeti1006
* 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
* 10:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 10:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 10:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
* 10:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
* 10:42 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 10:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
* 10:36 hnowlan: running general nodetool cleanup on aqs1004-a
* 10:35 hnowlan: running cleanup on aqs1004-a: nodetool-a cleanup "local_group_default_T_pageviews_per_project_v2" data
* 10:34 moritzm: drain ganeti1005
* 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
* 10:28 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 10:24 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:23 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
* 10:18 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:17 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 10:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 09:26 moritzm: drain ganeti2024
* 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
* 09:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
* 08:45 moritzm: drain ganeti2023
* 08:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
* 08:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
* 08:12 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2 for buster-wikimedia
* 08:11 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2
* 07:41 legoktm: upgraded lists1002 to hyperkitty 1.2.2-1+wmf1 ([[phab:T276687|T276687]])
* 07:36 legoktm: uploaded hyperkitty 1.2.2-1+wmf1 to buster-wikimedia ([[phab:T276687|T276687]])
* 07:35 jynus: restart db2135 [[phab:T278408|T278408]] [[phab:T273281|T273281]]
* 07:05 effie: enable puppet on all mediawiki servers
* 06:57 XioNoX: Option 82: use-vlan-id
* 06:53 effie: enable puppet on jobrunners
* 06:47 effie: enable puppet on parsoid
* 06:40 effie: disable puppet on all mediawiki servers to merge 673061 (service proxy to listen on ::1)
* 06:23 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 05:19 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 04:44 legoktm: restarted exim4 on lists1002 so it listens on 0.0.0.0 instead of 127.0.0.1
* 04:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 03:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 01:33 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 01:10 legoktm: mailman3: added lists-next.wikimedia.org domain
* 01:08 legoktm: mailman3: renamed default site from "example.com" to "lists-next.wikimedia.org"
* 00:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2777.codfw.wmnet
* 00:34 mutante: mw2377, mw2378 - first scap pull
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2378.codfw.wmnet
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2377.codfw.wmnet
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2378.codfw.wmnet
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2377.codfw.wmnet
* 00:29 legoktm: syncing facts for puppet-compiler
* 00:23 mutante: mw2377, mw2378 - reboot
* 00:14 twentyafterfour: phabricator update complete
* 00:10 twentyafterfour: deploying phabricator
* 00:05 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_eqiad "eqiad cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T23:55:35` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`


== 2021-03-24 ==
== 2023-05-20 ==
* 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 18:25 effie: restart varnish cp3061
* 23:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 16:39 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1018.eqiad.wmnet
* 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 15:17 hoo@deploy1002: Finished scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] (duration: 08m 47s)
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 15:10 hoo@deploy1002: hoo: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 15:08 hoo@deploy1002: Started scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]]
* 23:48 mutante: generating new mcrouter certs for mw2377, mw2378
* 14:41 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1018.eqiad.wmnet
* 22:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:07 legoktm: disabled puppet on lists1002 while mailman3-web is broken
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 21:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 21:19 mutante: webperf2001 - restarted apache
* 09:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:11 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 07s)
* 21:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 21:07 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GrowthExperiments: LinkRecommendation: Modify path args for calls to API - [[phab:T277865|T277865]] (duration: 01m 07s)
* 21:05 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Revert "Add default TemplateStyles for an Index" - [[phab:T278379|T278379]] (duration: 01m 07s)
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 21:02 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GlobalUsage: Fix hook registration after class was namespaced - [[phab:T278375|T278375]] (duration: 01m 07s)
* 20:59 hashar@deploy1002: Synchronized wmf-config/env.php: multiversion: Move '@' operator in env.php closer to relevant statement (duration: 01m 07s)
* 20:56 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:30 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:26 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:13 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:10 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 20:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 19:59 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:57 ryankemper: [[phab:T267927|T267927]] Host key is missing for `wdqs2008` leading to `data-transfer` cookbook failing, looking into resolving
* 19:55 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:50 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:50 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:49 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:49 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:45 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 19:45 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:42 ryankemper: [[phab:T267927|T267927]] Re-enabledpuppet on `wdqs2008` and ran puppet agent
* 19:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 19:14 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 to 1.36.0-wmf.35
* 19:07 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 21s)
* 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 19:03 urbanecm@deploy1002: Synchronized wmf-config/config/shwiki.yaml: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 3/3) (duration: 01m 08s)
* 19:02 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 2/3) (duration: 01m 06s)
* 19:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 1/3) (duration: 01m 07s)
* 18:54 urbanecm@deploy1002: Synchronized wmf-config/config/eswiki.yaml: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 3/3) (duration: 01m 06s)
* 18:53 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 2/3) (duration: 01m 07s)
* 18:52 urbanecm@deploy1002: sync-file aborted: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode (2/3) (duration: 00m 01s)
* 18:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 1/3) (duration: 01m 08s)
* 18:49 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:45 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 18:42 legoktm@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5aa050602954a3cab0c7e0c4b10efb0f957efb59}}: Promote several Growth target wikis out of dark mode ([[phab:T277491|T277491]]; [[phab:T276830|T276830]]; [[phab:T276123|T276123]]; [[phab:T276816|T276816]]; [[phab:T275550|T275550]]; [[phab:T276450|T276450]]) (duration: 01m 08s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|333393dfe59deb0ec4d7df6dd92372a705f65b85}}: Add autopatrol to autoreviewers in en.wikibooks ([[phab:T278300|T278300]]) (duration: 01m 09s)
* 18:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:25 effie: upgrade memcached on mc-gp* hosts
* 15:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 15:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 15:42 moritzm: reduce RAM for irc2001 to 2G, was originally created with 8 G [[phab:T224579|T224579]]
* 15:35 effie: enable puppet on all mediawiki + memcached hosts
* 15:20 moritzm: drain ganeti2022
* 15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
* 14:35 moritzm: drain ganeti2021
* 14:31 effie: disable puppet on all mediawiki servers + memcached for 674290
* 14:05 moritzm: failover Ganeti master in codfw to ganeti2019
* 13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
* 13:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
* 13:29 moritzm: installing irc1001
* 13:15 moritzm: drain ganeti2020
* 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
* 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
* 12:28 effie: enabling puppet on mediawiki and memcached servers
* 12:10 jynus: restart dbprov200[12] [[phab:T271913|T271913]]
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15076 and previous config saved to /var/cache/conftool/dbconfig/20210324-115940-root.json
* 11:57 Andrew-WMDE_: EU deploys done
* 11:53 jynus: restart dbprov100[12] [[phab:T271913|T271913]]
* 11:51 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/MassMessage/: Backport: [[gerrit:674367{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 08s)
* 11:49 effie: disable puppet on all hosts running mediawiki+memcached to merge 674282
* 11:45 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/MassMessage/: Backport: [[gerrit:674366{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 07s)
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15075 and previous config saved to /var/cache/conftool/dbconfig/20210324-114436-root.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15074 and previous config saved to /var/cache/conftool/dbconfig/20210324-112932-root.json
* 11:22 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673326{{!}}Enable CodeMirror accessibility colors on initial wikis (T276346)]] (duration: 01m 08s)
* 11:15 jynus: restart serially db2097 db2098 db2099 db2100 [[phab:T271913|T271913]]
* 11:14 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673312{{!}}Enable bracket matching on group0 and wikitech (T273591)]] (duration: 01m 25s)
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15073 and previous config saved to /var/cache/conftool/dbconfig/20210324-111429-root.json
* 10:50 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1001.wikimedia.org
* 10:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:45 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:44 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 10:36 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host irc1001.wikimedia.org
* 10:31 jynus: restart db1171 [[phab:T271913|T271913]]
* 10:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:14 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:14 jynus: restart db1145 [[phab:T271913|T271913]]
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:03 jynus: restart db1139 [[phab:T271913|T271913]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15072 and previous config saved to /var/cache/conftool/dbconfig/20210324-095655-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15071 and previous config saved to /var/cache/conftool/dbconfig/20210324-095606-root.json
* 09:51 jynus: restart db1116 [[phab:T271913|T271913]]
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15070 and previous config saved to /var/cache/conftool/dbconfig/20210324-094102-root.json
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15069 and previous config saved to /var/cache/conftool/dbconfig/20210324-092558-root.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15068 and previous config saved to /var/cache/conftool/dbconfig/20210324-091055-root.json
* 08:29 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
* 08:16 gehel: restarting wdqs updater on all nodes for config change
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics-external
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15066 and previous config saved to /var/cache/conftool/dbconfig/20210324-081057-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15065 and previous config saved to /var/cache/conftool/dbconfig/20210324-080725-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for schema change', diff saved to https://phabricator.wikimedia.org/P15064 and previous config saved to /var/cache/conftool/dbconfig/20210324-080223-marostegui.json
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-main
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-logging-external
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15063 and previous config saved to /var/cache/conftool/dbconfig/20210324-075553-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15062 and previous config saved to /var/cache/conftool/dbconfig/20210324-075221-root.json
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-main
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-logging-external
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=zotero
* 07:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15061 and previous config saved to /var/cache/conftool/dbconfig/20210324-074050-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15060 and previous config saved to /var/cache/conftool/dbconfig/20210324-073718-root.json
* 07:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P15059 and previous config saved to /var/cache/conftool/dbconfig/20210324-072319-marostegui.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15058 and previous config saved to /var/cache/conftool/dbconfig/20210324-072214-root.json
* 07:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ml-etcd2002.codfw.wmnet
* 07:10 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts ml-etcd2002.codfw.wmnet
* 07:09 moritzm: installing squid security updates
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1181 to dbctl, depooled [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15057 and previous config saved to /var/cache/conftool/dbconfig/20210324-063459-marostegui.json
* 06:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1084.eqiad.wmnet
* 06:14 root@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1084.eqiad.wmnet
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P15056 and previous config saved to /var/cache/conftool/dbconfig/20210324-055246-marostegui.json
* 04:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 03:41 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 03:41 ryankemper: [[phab:T274204|T274204]] Restarting `codfw` restart; the timestamp argument should prevent it from wasting time on nodes that have been rebooted already
* 03:40 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 03:39 ryankemper: [[phab:T274204|T274204]] Timed out waiting for write queues to empty: `[59/60, retrying in 60.00s] Attempt to run 'spicerack.elasticsearch_cluster.ElasticsearchClusters.wait_for_all_write_queues_empty' raised: Write queue not empty (had value of 241631) for partition 0 of topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite.`
* 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 02:38 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 02:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 01:59 ryankemper: [[phab:T274204|T274204]] For now I'll proceed to the reboots of `codfw`
* 01:59 ryankemper: [[phab:T274204|T274204]] `ctrl+c`'d out of run; relforge is relying on outdated config that is trying to talk to `relforge1002` which no longer exists. Need to refactor so that config no longer lives in spicerack
* 01:58 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade-reboot (exit_code=97)
* 01:49 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade-reboot relforge "relforge cluster restarts" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T01:45:59+00:00` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 01:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade-reboot
* 01:36 eileen: civicrm revision changed from {{Gerrit|f36a0b08f0}} to {{Gerrit|ad430721f6}}, config revision is {{Gerrit|26b02db7ba}}
* 00:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
* 00:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE


== 2021-03-23 ==
== 2023-05-19 ==
* 22:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 22:33 dwisehaupt: pushing {{Gerrit|60f9baaf50b}} to fundraising hosts which will enable ssl by default for mysql client connections that use the host my.cnf file - [[phab:T170321|T170321]]
* 21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by
* 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace (duration: 02m 07s)
* 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace
* 22:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:05 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 21:27 ppchelko@deploy1002: Finished deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint (duration: 17m 58s)
* 21:09 ppchelko@deploy1002: Started deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint
* 21:04 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 robh@cumin1001: START - Cookbook sre.dns.netbox


== 2021-03-22 ==
== 2023-05-18 ==
* 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 23:26 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 22:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 23:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2250.codfw.wmnet
* 22:21 mutante: contint2001 - moving files owned by zuul to new UID/GID - in progress
* 23:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:20 mutante: short down-time for zuul-merger on contint2001
* 23:18 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: [[phab:T262612|T262612]]: Start glent m1 ab test (duration: 01m 53s)
* 21:47 mutante: maintenance for zuul (CI) on contint servers
* 23:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 23:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2250.codfw.wmnet
* 21:13 brennen@deploy1002: Finished scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] (duration: 09m 38s)
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2249.codfw.wmnet
* 21:05 brennen@deploy1002: brennen: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:52 mutante: decom mw2249
* 21:03 brennen@deploy1002: Started scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]]
* 22:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2249.codfw.wmnet
* 21:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] (duration: 08m 09s)
* 21:08 sbassett: Deployed security patch for [[phab:T272244|T272244]]
* 20:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet,service=canary
* 20:53 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]]
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet,service=canary
* 20:36 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2279.codfw.wmnet,service=canary
* 20:33 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2278.codfw.wmnet,service=canary
* 20:16 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] (duration: 10m 25s)
* 19:50 mutante: gerrit2001 - restarted apache2 as well for consistency
* 20:07 urbanecm@deploy1002: ksarabia and urbanecm: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 19:47 mutante: gerrit - restarting apache2 after we dropped MaxClients config line. This should make us fall back to Debian default MaxRequestWorkers. (since we use event MPM we should not be using MaxClients in the first place, says #httpd) ([[phab:T277127|T277127]])
* 20:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]]
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|25247c9cbba3d3741908164f2d15fb8497ce8b5e}}: hrwiki: Configure mentorship for Growth team features ([[phab:T275684|T275684]]) (duration: 01m 00s)
* 18:57 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]] (duration: 00m 35s)
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|951601f7a4c887f21e209b32dbd1cfd3da084816}}: Grant enwiki pagemovers the delete-redirect right ([[phab:T278131|T278131]]) (duration: 00m 59s)
* 18:56 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]]
* 17:30 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T274200|T274200]])
* 18:55 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 16:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8  refs [[phab:T330215|T330215]]
* 16:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
* 16:47 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:46 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 18:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 18:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 16:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14990 and previous config saved to /var/cache/conftool/dbconfig/20210322-155808-root.json
* 18:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14989 and previous config saved to /var/cache/conftool/dbconfig/20210322-154304-root.json
* 18:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 15:38 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 15:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14988 and previous config saved to /var/cache/conftool/dbconfig/20210322-152800-root.json
* 18:07 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]]
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14987 and previous config saved to /var/cache/conftool/dbconfig/20210322-151257-root.json
* 18:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 14:26 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:59 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]]
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:38 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:23 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 17:37 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 17:36 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 17:35 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P14986 and previous config saved to /var/cache/conftool/dbconfig/20210322-141146-marostegui.json
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14985 and previous config saved to /var/cache/conftool/dbconfig/20210322-140800-root.json
* 17:27 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad - [[phab:T277771|T277771]]
* 17:26 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad
* 17:26 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14984 and previous config saved to /var/cache/conftool/dbconfig/20210322-135256-root.json
* 17:26 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14983 and previous config saved to /var/cache/conftool/dbconfig/20210322-133753-root.json
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:55 XioNoX: push new pfw policies - [[phab:T336896|T336896]]
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14982 and previous config saved to /var/cache/conftool/dbconfig/20210322-132249-root.json
* 16:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
* 13:16 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 15:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 12:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 15:57 inflatador: bking@cumin1001 starting rolling restart of wcqs for java updates [[phab:T334470|T334470]]
* 12:20 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:53 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
* 12:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 15:50 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P14981 and previous config saved to /var/cache/conftool/dbconfig/20210322-121924-marostegui.json
* 15:47 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) (duration: 00m 10s)
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14980 and previous config saved to /var/cache/conftool/dbconfig/20210322-112954-root.json
* 15:47 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided)
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14979 and previous config saved to /var/cache/conftool/dbconfig/20210322-112707-root.json
* 15:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:37 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:31 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
* 11:15 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14978 and previous config saved to /var/cache/conftool/dbconfig/20210322-111451-root.json
* 15:25 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 11:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:23 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14977 and previous config saved to /var/cache/conftool/dbconfig/20210322-111203-root.json
* 15:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:19 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:18 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14976 and previous config saved to /var/cache/conftool/dbconfig/20210322-105947-root.json
* 15:18 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14975 and previous config saved to /var/cache/conftool/dbconfig/20210322-105700-root.json
* 15:17 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:15 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 10:51 moritzm: installing libdbi-perl security updates
* 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:09 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:03 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) (duration: 00m 06s)
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:02 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided)
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14974 and previous config saved to /var/cache/conftool/dbconfig/20210322-104443-root.json
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14973 and previous config saved to /var/cache/conftool/dbconfig/20210322-104156-root.json
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:42 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
* 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 14:34 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 14:31 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:01 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-serve-worker-codfw
* 10:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:59 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:52 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:50 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 10:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:47 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:18 TheresNoTime: closing backport window
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] (duration: 08m 45s)
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:07 samtar@deploy1002: samtar and s-mukuti: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]]
* 10:12 elukey: run homer for cr1/cr2 eqiad and codfw to add new iBGP session for the k8s ML clusters - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/661055
* 13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 09:50 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config cleanup (duration: 00m 57s)
* 12:59 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 06m 19s)
* 09:49 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config cleanup (duration: 00m 59s)
* 12:57 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 09:48 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config cleanup (duration: 01m 20s)
* 12:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for schema change', diff saved to https://phabricator.wikimedia.org/P14971 and previous config saved to /var/cache/conftool/dbconfig/20210322-093558-marostegui.json
* 12:54 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14970 and previous config saved to /var/cache/conftool/dbconfig/20210322-091534-root.json
* 12:51 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14969 and previous config saved to /var/cache/conftool/dbconfig/20210322-090030-root.json
* 12:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14968 and previous config saved to /var/cache/conftool/dbconfig/20210322-084527-root.json
* 12:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14967 and previous config saved to /var/cache/conftool/dbconfig/20210322-083023-root.json
* 12:46 otto@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 07m 00s)
* 08:13 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]] [[phab:T268435|T268435]]
* 12:46 elukey: clean up old jupyterhub.service references (crash looping) on stat* nodes that had it
* 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 12:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 08:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 08:02 jayme: build and release docker-registry.discovery.wmnet/eventrouter:0.3.0-6, docker-registry.discovery.wmnet/fluent-bit:1.5.3-3, docker-registry.discovery.wmnet/ratelimit:1.5.1-s3
* 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
* 08:00 marostegui: Stop MySQL on db1085 to clone db1165 (lag will appear on s6 on wiki replicas) [[phab:T258361|T258361]]
* 12:35 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
* 08: