You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(cdanis@cumin1001: dbctl commit (dc=all): 'slight deweight to db1111', diff saved to https://phabricator.wikimedia.org/P10960 and previous config saved to /var/cache/conftool/dbconfig/20200411-195235-cdanis.json)
imported>Stashbot
(zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in group1 wikis (T299954) (duration: 08m 00s))
 
Line 1: Line 1:
== 2020-04-11 ==
== 2023-05-30 ==
* 19:52 cdanis@cumin1001: dbctl commit (dc=all): 'slight deweight to db1111', diff saved to https://phabricator.wikimedia.org/P10960 and previous config saved to /var/cache/conftool/dbconfig/20200411-195235-cdanis.json
* 23:38 zabe@deploy1002: Finished scap: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]] (duration: 08m 00s)
* 17:35 cdanis@cumin1001: dbctl commit (dc=all): 's8: +weight db1111, -weight db1126', diff saved to https://phabricator.wikimedia.org/P10959 and previous config saved to /var/cache/conftool/dbconfig/20200411-173517-cdanis.json
* 23:31 zabe@deploy1002: zabe: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 15:39 vgutierrez: restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
* 23:30 zabe@deploy1002: Started scap: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]]
* 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 22:22 ejegg: civicrm upgraded from {{Gerrit|415aa7e5}} to {{Gerrit|5905a403}}
* 09:20 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 21:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] (duration: 07m 48s)
* 07:01 vgutierrez: restart ats-tls on cp[1079,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
* 21:50 samtar@deploy1002: jforrester and samtar: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 21:48 samtar@deploy1002: Started scap: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]]
* 20:58 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:924569{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:57 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:924569{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]]
* 20:40 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:924568{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]] (duration: 09m 27s)
* 20:32 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:924568{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:30 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:924568{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]]
* 20:12 inflatador: bking@wdqs2009 depool wdqs2009 until it catches up with lag
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:924536{{!}}Turn on A/B Test Hebrew (T336969)]] (duration: 08m 46s)
* 20:03 samtar@deploy1002: ksarabia and samtar: Backport for [[gerrit:924536{{!}}Turn on A/B Test Hebrew (T336969)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:01 samtar@deploy1002: Started scap: Backport for [[gerrit:924536{{!}}Turn on A/B Test Hebrew (T336969)]]
* 19:48 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. [[phab:T335305|T335305]]. (duration: 00m 09s)
* 19:48 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. [[phab:T335305|T335305]].
* 19:36 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 04m 02s)
* 19:32 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
* 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 54s)
* 19:29 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
* 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 16m 36s)
* 19:24 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
* 19:12 inflatador: [WDQS Deploy] Deploying version 0.3.124
* 19:11 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
* 18:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]]
* 17:45 mutante: re-enabling puppet on contint2001
* 16:20 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyEdited
* 16:19 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyRegistered
* 16:14 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:924053{{!}}[Growth] Enable user impact refresh on 10 more wikis (T336203)]] (duration: 07m 08s)
* 16:07 urbanecm@deploy1002: Started scap: Backport for [[gerrit:924053{{!}}[Growth] Enable user impact refresh on 10 more wikis (T336203)]]
* 16:00 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:00 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:58 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:58 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:57 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:56 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:56 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:54 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2002.codfw.wmnet with OS bullseye
* 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:49 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:49 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:15 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
* 15:15 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 15:14 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 15:10 tgr_: UTC evening deploys done
* 15:08 tgr@deploy1002: Finished scap: Backport for [[gerrit:924160{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]] (duration: 08m 08s)
* 15:05 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 15:03 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 15:02 tgr@deploy1002: tgr and matmarex: Backport for [[gerrit:924160{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 15:00 tgr@deploy1002: Started scap: Backport for [[gerrit:924160{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]]
* 14:50 moritzm: installing texlive-bin security updates
* 14:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
* 14:46 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
* 14:36 tgr@deploy1002: Finished scap: Backport for [[gerrit:924159{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]] (duration: 08m 01s)
* 14:29 tgr@deploy1002: matmarex and tgr: Backport for [[gerrit:924159{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:28 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
* 14:27 tgr@deploy1002: Started scap: Backport for [[gerrit:924159{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]]
* 14:16 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mwlog2002.codfw.wmnet with OS bullseye
* 14:16 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetdb1003.eqiad.wmnet with OS bookworm
* 14:14 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:13 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:08 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:06 moritzm: installing libwebp security updates
* 14:06 tgr@deploy1002: Finished scap: Backport for [[gerrit:924158{{!}}editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]] (duration: 08m 14s)
* 13:59 tgr@deploy1002: tgr and matmarex: Backport for [[gerrit:924158{{!}}editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:58 tgr@deploy1002: Started scap: Backport for [[gerrit:924158{{!}}editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]]
* 13:57 tgr@deploy1002: Finished scap: Backport for [[gerrit:924488{{!}}prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]] (duration: 16m 13s)
* 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2009.codfw.wmnet
* 13:55 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
* 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS bullseye
* 13:42 tgr@deploy1002: tgr and daimona: Backport for [[gerrit:924488{{!}}prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:40 tgr@deploy1002: Started scap: Backport for [[gerrit:924488{{!}}prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]]
* 13:33 mlitn@deploy1002: Finished scap: Backport for [[gerrit:924454{{!}}Fix maxJobs default]], [[gerrit:924455{{!}}Fix maxJobs default]] (duration: 07m 39s)
* 13:27 mlitn@deploy1002: mlitn: Backport for [[gerrit:924454{{!}}Fix maxJobs default]], [[gerrit:924455{{!}}Fix maxJobs default]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:25 mlitn@deploy1002: Started scap: Backport for [[gerrit:924454{{!}}Fix maxJobs default]], [[gerrit:924455{{!}}Fix maxJobs default]]
* 13:20 tgr@deploy1002: Finished scap: Backport for [[gerrit:924079{{!}}GrowthExperiments: Re-add $wgGERestbaseUrl]] (duration: 09m 26s)
* 13:13 tgr@deploy1002: tgr: Backport for [[gerrit:924079{{!}}GrowthExperiments: Re-add $wgGERestbaseUrl]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:11 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
* 13:11 tgr@deploy1002: Started scap: Backport for [[gerrit:924079{{!}}GrowthExperiments: Re-add $wgGERestbaseUrl]]
* 13:09 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
* 13:09 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 13:09 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 13:08 bblack: lvs1018: restart pybal for wikireplicas monitoring removal
* 13:08 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 13:06 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 13:06 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
* 13:06 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 13:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 13:03 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
* 13:00 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
* 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:48 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS bullseye
* 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:29 volans: disablig puppet where cadvisor is present
* 12:14 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
* 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
* 11:50 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
* 11:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 11:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 11:46 slyngshede@cumin1001: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 11:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
* 11:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
* 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:14 hashar@deploy1002: Finished deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - [[phab:T331651|T331651]] (duration: 00m 08s)
* 11:14 hashar@deploy1002: Started deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - [[phab:T331651|T331651]]
* 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2006.codfw.wmnet with OS bookworm
* 11:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 10:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
* 10:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 10:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 10:50 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
* 10:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 10:41 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 10:11 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard2003.codfw.wmnet with OS bookworm
* 10:11 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard1003.eqiad.wmnet with OS bookworm
* 10:00 zabe@deploy1002: Finished scap: Backport for [[gerrit:924469{{!}}Start reading from rev_comment_id in group0 wikis (T299954)]] (duration: 08m 12s)
* 09:59 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
* 09:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
* 09:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
* 09:54 zabe@deploy1002: zabe: Backport for [[gerrit:924469{{!}}Start reading from rev_comment_id in group0 wikis (T299954)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 09:52 zabe@deploy1002: Started scap: Backport for [[gerrit:924469{{!}}Start reading from rev_comment_id in group0 wikis (T299954)]]
* 09:52 zabe@deploy1002: Finished scap: Backport for [[gerrit:923635{{!}}Check for null when using ::getCheckUserHelperFieldset (T337599)]] (duration: 09m 52s)
* 09:49 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
* 09:46 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
* 09:43 zabe@deploy1002: zabe: Backport for [[gerrit:923635{{!}}Check for null when using ::getCheckUserHelperFieldset (T337599)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:43 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bookworm
* 09:42 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:42 zabe@deploy1002: Started scap: Backport for [[gerrit:923635{{!}}Check for null when using ::getCheckUserHelperFieldset (T337599)]]
* 09:40 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 09:37 zabe@deploy1002: Finished scap: Backport for [[gerrit:922492{{!}}Start reading from rev_comment_id in test wikis (T299954)]] (duration: 07m 48s)
* 09:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
* 09:33 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:33 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:32 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
* 09:30 zabe@deploy1002: zabe: Backport for [[gerrit:922492{{!}}Start reading from rev_comment_id in test wikis (T299954)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:30 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:30 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb2003.codfw.wmnet with OS bookworm
* 09:29 zabe@deploy1002: Started scap: Backport for [[gerrit:922492{{!}}Start reading from rev_comment_id in test wikis (T299954)]]
* 09:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:24 tgr@deploy1002: Finished scap: Backport for [[gerrit:924361{{!}}Improve handling of missing image recommendation]] (duration: 08m 57s)
* 09:22 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
* 09:20 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
* 09:19 arturo: run aborrero@cumin1001:~ 2s 98 $ sudo cumin "P<nowiki>{</nowiki>R:Profile::Mariadb::Section = 's7'<nowiki>}</nowiki> and P<nowiki>{</nowiki>P:wmcs::db::wikireplicas::mariadb_multiinstance<nowiki>}</nowiki>" "/usr/local/sbin/maintain-meta_p --all-databases --bootstrap"
* 09:17 tgr@deploy1002: tgr: Backport for [[gerrit:924361{{!}}Improve handling of missing image recommendation]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 09:15 tgr@deploy1002: Started scap: Backport for [[gerrit:924361{{!}}Improve handling of missing image recommendation]]
* 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:14 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:13 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:11 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:11 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 09:06 tgr@deploy1002: Finished scap: Backport for [[gerrit:923644{{!}}Section images: Do not treat unexpected kinds as production errors]] (duration: 14m 22s)
* 09:00 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:00 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:59 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:54 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:53 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:53 tgr@deploy1002: tgr: Backport for [[gerrit:923644{{!}}Section images: Do not treat unexpected kinds as production errors]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 08:52 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:51 tgr@deploy1002: Started scap: Backport for [[gerrit:923644{{!}}Section images: Do not treat unexpected kinds as production errors]]
* 08:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:50 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:49 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:49 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:48 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:43 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:41 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:39 tgr@deploy1002: Finished scap: Backport for [[gerrit:923643{{!}}Improve logging of invalid image recommendation kinds]] (duration: 10m 30s)
* 08:39 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:39 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:36 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:36 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:35 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:34 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:33 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:31 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:31 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:30 tgr@deploy1002: tgr: Backport for [[gerrit:923643{{!}}Improve logging of invalid image recommendation kinds]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:29 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 08:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:29 tgr@deploy1002: Started scap: Backport for [[gerrit:923643{{!}}Improve logging of invalid image recommendation kinds]]
* 08:29 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:27 jayme: re-enable puppet on P:kubernetes::node for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
* 08:25 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:20 jayme: disable puppet on P:kubernetes::node (apart from staging-codfw) for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
* 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:15 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:14 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:12 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:12 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
* 08:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
* 08:08 tgr@deploy1002: Finished scap: Backport for [[gerrit:924356{{!}}Section images: Accept more recommendation types]] (duration: 07m 51s)
* 08:01 tgr@deploy1002: tgr: Backport for [[gerrit:924356{{!}}Section images: Accept more recommendation types]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:00 tgr@deploy1002: Started scap: Backport for [[gerrit:924356{{!}}Section images: Accept more recommendation types]]
* 07:56 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:924086{{!}}Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]] (duration: 09m 17s)
* 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb2003.codfw.wmnet with OS bookworm
* 07:48 ladsgroup@deploy1002: func and ladsgroup: Backport for [[gerrit:924086{{!}}Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:46 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:924086{{!}}Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]]
* 07:45 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:45 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48633 and previous config saved to /var/cache/conftool/dbconfig/20230530-074445-root.json
* 07:44 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:42 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:41 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:40 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:38 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:38 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 07:31 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:31 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:30 moritzm: move LDAP permissions for hghani from cn=nda to cn=wmf [[phab:T322145|T322145]]
* 07:30 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48632 and previous config saved to /var/cache/conftool/dbconfig/20230530-072941-root.json
* 07:29 kartik@deploy1002: Finished scap: Backport for [[gerrit:924050{{!}}testwiki: Enable Section Translation for 9 Wikipedia (T337290)]] (duration: 09m 38s)
* 07:28 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:21 kartik@deploy1002: kartik: Backport for [[gerrit:924050{{!}}testwiki: Enable Section Translation for 9 Wikipedia (T337290)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:19 kartik@deploy1002: Started scap: Backport for [[gerrit:924050{{!}}testwiki: Enable Section Translation for 9 Wikipedia (T337290)]]
* 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:17 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:16 kartik@deploy1002: Finished scap: Backport for [[gerrit:923527{{!}}Undeploy Special:Contribute from unsupported skins (T337366)]] (duration: 11m 49s)
* 07:16 moritzm: update bookworm installer to rc4 [[phab:T330495|T330495]]
* 07:16 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48630 and previous config saved to /var/cache/conftool/dbconfig/20230530-071436-root.json
* 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:10 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 07:10 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:09 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:06 kartik@deploy1002: kartik: Backport for [[gerrit:923527{{!}}Undeploy Special:Contribute from unsupported skins (T337366)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 07:06 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:04 kartik@deploy1002: Started scap: Backport for [[gerrit:923527{{!}}Undeploy Special:Contribute from unsupported skins (T337366)]]
* 07:04 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:03 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 07:02 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:02 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:01 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48629 and previous config saved to /var/cache/conftool/dbconfig/20230530-065932-root.json
* 06:58 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 06:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 06:51 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:50 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:48 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 06:48 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48628 and previous config saved to /var/cache/conftool/dbconfig/20230530-064427-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48625 and previous config saved to /var/cache/conftool/dbconfig/20230530-062922-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48624 and previous config saved to /var/cache/conftool/dbconfig/20230530-061417-root.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48623 and previous config saved to /var/cache/conftool/dbconfig/20230530-055913-root.json
* 05:43 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 62597
* 05:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
* 05:41 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 1255 hosts
* 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 1255 hosts
* 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 784 hosts
* 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 784 hosts
* 05:28 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 784 hosts
* 05:27 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 784 hosts
* 05:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 1255 hosts
* 05:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 1255 hosts
* 05:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62597
* 05:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
* 04:28 kart_: Updated cxserver to 2023-05-29-112644-production ([[phab:T337657|T337657]])
* 04:28 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 04:27 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 04:24 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 04:24 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 04:21 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 04:20 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.9 (duration: 02m 10s)
* 03:52 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]] (duration: 49m 54s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]]


== 2020-04-10 ==
== 2023-05-29 ==
* 21:12 cdanis@cumin1001: dbctl commit (dc=all): 'db1111 seems overloaded', diff saved to https://phabricator.wikimedia.org/P10954 and previous config saved to /var/cache/conftool/dbconfig/20200410-211202-cdanis.json
* 15:19 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
* 19:37 cdanis: cdanis@re0.cr1-codfw> clear bfd session address 208.80.153.220
* 15:19 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
* 15:03 vgutierrez: restart ats-tls on cp1083 and cp1085 - [[phab:T249335|T249335]]
* 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 13:14 hashar@deploy1001: Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 40s)
* 14:18 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 13:14 hashar@deploy1001: Started deploy [zuul/deploy@4a69913]: (no justification provided)
* 13:57 vgutierrez@puppetmaster1001: conftool action : set/weight=10; selector: name=dbproxy.*,dc=eqiad
* 13:12 mutante: restarted and re-armed keyholder on deploy1001 to pick up changes for zuul scap deploy
* 11:25 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 12:12 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:24 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 12:11 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48618 and previous config saved to /var/cache/conftool/dbconfig/20230529-112242-root.json
* 12:10 mutante: Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. ([[phab:T249907|T249907]])
* 11:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 12:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:13 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 12:10 mutante: Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. This may take a few minutes.
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48617 and previous config saved to /var/cache/conftool/dbconfig/20230529-110737-root.json
* 12:10 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48616 and previous config saved to /var/cache/conftool/dbconfig/20230529-105233-root.json
* 12:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48615 and previous config saved to /var/cache/conftool/dbconfig/20230529-103728-root.json
* 12:09 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48614 and previous config saved to /var/cache/conftool/dbconfig/20230529-102223-root.json
* 11:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'canary' .
* 10:07 vgutierrez: restarting pybal on lvs1018
* 11:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48612 and previous config saved to /var/cache/conftool/dbconfig/20230529-100719-root.json
* 11:44 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:39 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 10:05 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10953 and previous config saved to /var/cache/conftool/dbconfig/20200410-094359-marostegui.json
* 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10952 and previous config saved to /var/cache/conftool/dbconfig/20200410-093129-marostegui.json
* 10:05 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 08:52 hashar@deploy1001: Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 16s)
* 10:04 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 08:51 hashar@deploy1001: Started deploy [zuul/deploy@4a69913]: (no justification provided)
* 10:04 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 08:46 hashar@deploy1001: Finished deploy [zuul/deploy@5a0a03a]: (no justification provided) (duration: 02m 20s)
* 10:03 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 08:44 hashar@deploy1001: Started deploy [zuul/deploy@5a0a03a]: (no justification provided)
* 10:03 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 08:39 mutante: deploy1001 - keyholder disarm, keyholder arm
* 10:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 08:32 mutante: fix comment in deployment ssh key for zuul to include the path to the key on deploy1001
* 10:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 08:24 vgutierrez: update puppet compiler facts
* 10:00 vgutierrez: restarting pybal on lvs1020
* 08:20 hashar@deploy1001: Finished deploy [integration/zuul/deploy@6c3ddad]: (no justification provided) (duration: 00m 11s)
* 09:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 08:19 hashar@deploy1001: Started deploy [integration/zuul/deploy@6c3ddad]: (no justification provided)
* 09:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 08:03 hashar@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 05s)
* 09:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 08:03 hashar@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 09:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 07:52 mutante: closing port 80 on phab hosts for caching servers
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48611 and previous config saved to /var/cache/conftool/dbconfig/20230529-095214-root.json
* 07:37 ema: cp3050: back to vhtcpd for the holidays [[phab:T249583|T249583]]
* 09:52 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 07:00 mutante: sodium - sudo -u mirror ftpsync
* 09:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 06:58 mutante: armed keyholder on deploy1001
* 09:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 06:00 marostegui: Stop MySQL on pc1008 for upgrade
* 09:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48610 and previous config saved to /var/cache/conftool/dbconfig/20230529-093709-root.json
* 09:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 09:31 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 09:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 09:29 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 09:13 godog: start partial rollout of cadvisor to eqiad/codfw (~10%) [[phab:T108027|T108027]]
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48609 and previous config saved to /var/cache/conftool/dbconfig/20230529-090216-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48608 and previous config saved to /var/cache/conftool/dbconfig/20230529-084711-root.json
* 08:45 godog: delete old raw blocks from thanos - [[phab:T337236|T337236]]
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48607 and previous config saved to /var/cache/conftool/dbconfig/20230529-083206-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48606 and previous config saved to /var/cache/conftool/dbconfig/20230529-081702-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48604 and previous config saved to /var/cache/conftool/dbconfig/20230529-080157-root.json
* 07:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48603 and previous config saved to /var/cache/conftool/dbconfig/20230529-074653-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48602 and previous config saved to /var/cache/conftool/dbconfig/20230529-073148-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48601 and previous config saved to /var/cache/conftool/dbconfig/20230529-071643-root.json
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s2, s3, s5 [[phab:T337446|T337446]]', diff saved to https://phabricator.wikimedia.org/P48598 and previous config saved to /var/cache/conftool/dbconfig/20230529-051043-root.json


== 2020-04-09 ==
== 2023-05-28 ==
* 23:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 13:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 23:27 catrope@deploy1001: Synchronized wmf-config/mobile.php: Drop fallback support for wgMobileFrontendLogo ([[phab:T248500|T248500]]) (duration: 00m 58s)
* 13:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 23:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Drop unused config for main page CSS ([[phab:T243996|T243996]]) (duration: 00m 58s)
* 13:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
* 23:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add extendedconfirmed group and protection level on jawiki ([[phab:T249820|T249820]]) (duration: 00m 59s)
* 13:16 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
* 22:01 sukhe: running initial metadb sync on cescout1001
* 06:12 marostegui: Change innodb_fast_shutdown to 0 on db1154 before downgrading [[phab:T337446|T337446]]
* 19:43 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 19:41 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 19:39 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 19:08 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 19:01 longma: deploying 1.35.0-wmf.27 to all wikis
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:40 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:24 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:39 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:32 XioNoX: disable down interfaces from fasw-c-codfw (mintaka)
* 13:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:43 mlitn@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/MachineVision/: [MachineVision] Fix statement creation from suggestion (duration: 01m 09s)
* 12:31 ema: cp3051: upgrade varnish to 5.1.3-1wm13 once again, restart varnish-fe [[phab:T249809|T249809]]
* 11:57 XioNoX: offload more traffic from NTT eqiad - [[phab:T249808|T249808]]
* 11:20 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}587257{{!}}Enable ContentTranslation as a default tool in Slovenian WP (T248836)]], take II (duration: 01m 06s)
* 11:19 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}587257{{!}}Enable ContentTranslation as a default tool in Slovenian WP (T248836)]] (duration: 01m 07s)
* 10:50 vgutierrez: rolling upgrade to trafficserver 8.0.6-1mw7
* 10:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:50 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:43 ema: repool cp3051 [[phab:T249809|T249809]]
* 10:30 ema: cp3051: re-enable transient storage limit, downgrade varnish to 5.1.3-1wm12 (no 0035-vbf_stp_condfetch_crash.patch) and restart varnish-fe [[phab:T249809|T249809]]
* 09:46 ema: cp3051: disable transient storage limit and restart varnish-fe [[phab:T249809|T249809]]
* 09:31 XioNoX: offload traffic from NTT eqiad - [[phab:T249808|T249808]]
* 07:56 mutante: contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3  (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) ([[phab:T224591|T224591]])
* 07:56 mutante: contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3  (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206)
* 07:24 moritzm: synched jenkins 222.1 to apt.wikimedia.org (buster-wikimedia, thirdparty/ci) [[phab:T224591|T224591]]
* 07:12 marostegui: Repool labsdb1011
* 07:10 XioNoX: switch urpf from log to syslog in ulsfo
* 07:04 XioNoX: re-activate BGP to Zayo in eqiad
* 06:59 vgutierrez: upgrade ats to version 8.0.6-1wm7 in cp[4026,4032,5006,5012]
* 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:43 XioNoX: confirmed on one host that the change didn't break logstash. Re-enable Puppet on logstash hosts - [[phab:T244147|T244147]]
* 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:36 XioNoX: disabling puppet on logstash host for CR deploy - [[phab:T244147|T244147]]
* 06:30 XioNoX: push urpf log only to eqiad - [[phab:T244147|T244147]]
* 06:25 XioNoX: push urpf log only to eqsin - [[phab:T244147|T244147]]
* 06:21 XioNoX: push urpf log only to AMS - [[phab:T244147|T244147]]
* 05:40 vgutierrez: upgrade ats to version 8.0.6-1wm6 in cp[4025,4031,5005,5011] - [[phab:T249335|T249335]]
* 05:37 marostegui: Stop MySQL on pc2008 for upgrade to Buster and 10.4
* 05:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 01m 08s)
* 05:08 marostegui: Deploy schema change on db1123
* 05:07 vgutierrez: upload trafficserver 8.0.6-1wm6 to apt.wm.o (buster) - [[phab:T249335|T249335]]


== 2020-04-08 ==
== 2023-05-27 ==
* 21:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TemplateData/includes/TemplateDataHooks.php: Restore call to OutputPage::setupOOUI() (duration: 01m 07s)
* 21:40 Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 ([[phab:T337446|T337446]])
* 21:19 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TemplateData/includes/TemplateDataHooks.php: Restore call to OutputPage::setupOOUI() (duration: 01m 09s)
* 17:42 godog: silence systemd state alert flapping on stat1009 until monday
* 20:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 00:03 tzatziki: removing 1 file for legal compliance
* 20:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 20:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 20:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 20:04 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 20:04 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 19:51 gehel: restart wdqs-updater after deployment
* 19:49 mstyles@deploy1001: Finished deploy [wdqs/wdqs@c2995eb]: WDQS version 0.3.21 (duration: 14m 37s)
* 19:44 dpifke@deploy1001: Finished deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091 (duration: 00m 05s)
* 19:44 dpifke@deploy1001: Started deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091
* 19:35 mstyles@deploy1001: Started deploy [wdqs/wdqs@c2995eb]: WDQS version 0.3.21
* 19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]] (duration: 01m 06s)
* 19:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 19:02 longma: deploying 1.35.0-wmf.27 to group1
* 18:37 jforrester@deploy1001: Synchronized php-1.35.0-wmf.27/skins/Vector: [[phab:T248761|T248761]]: Revert moving indicators in DOM (duration: 01m 07s)
* 18:17 reedy@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TemplateData/includes/TemplateDataHooks.php: [[phab:T236809|T236809]] (duration: 01m 06s)
* 18:16 reedy@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TemplateData/includes/TemplateDataHooks.php: [[phab:T236809|T236809]] (duration: 01m 10s)
* 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:23 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 ema: cache_upload: rolling varnish-fe restarts to bump transient storage limit [[phab:T185968|T185968]]
* 15:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 ema: cp3051: param.set shortlived=0 to try ease pressure on transient memory
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P10947 and previous config saved to /var/cache/conftool/dbconfig/20200408-142341-marostegui.json
* 14:14 jeh@deploy1001: Finished deploy [horizon/deploy@0d18f67]: update horizon submodule to enable server groups (duration: 03m 30s)
* 14:10 jeh@deploy1001: Started deploy [horizon/deploy@0d18f67]: update horizon submodule to enable server groups
* 13:40 mutante: stopped and masked zuul-merger service on contint2001 via puppet ([[phab:T224591|T224591]])
* 13:30 ema: cp3050: stop vhtcpd, start purged [[phab:T249583|T249583]]
* 13:22 vgutierrez: enable inbound TLSv1.3 in text@ulsfo - [[phab:T170567|T170567]]
* 13:05 ema: purged 0.1 uploaded to buster-wikimedia [[phab:T249583|T249583]]
* 12:31 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 07s)
* 12:29 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585219{{!}}Enable GrowthExperiments suggested edits on uk, hu, hy, eu wikipedias (T247308)]] (duration: 01m 08s)
* {{safesubst:SAL entry|1=12:17 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584135{{!}}Enable GrowthExperiments welcome survey on Ukrainian, Hungarian, Armenian Wikipedias (T238295) (duration: 01m 08s)}}
* 12:09 tgr@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:584183{{!}}Enable GrowthExperiments on French Wiktionary (T235964)]] (duration: 01m 06s)
* 11:56 tgr@deploy1001: Synchronized dblists/: SWAT: [[gerrit:584183{{!}}Enable GrowthExperiments on French Wiktionary (T235964)]] (duration: 01m 03s)
* 11:48 mutante: logstash1009 - restarted logstash
* 11:43 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585766{{!}}Enable WikibaseQualityConstraints on test commons (T248117)]] (duration: 01m 05s)
* 11:43 marostegui: Deploy schema change on db1112, this will generate lag on labs s3
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P10942 and previous config saved to /var/cache/conftool/dbconfig/20200408-114315-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change', diff saved to https://phabricator.wikimedia.org/P10941 and previous config saved to /var/cache/conftool/dbconfig/20200408-113901-marostegui.json
* 11:29 tgr@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:584133{{!}}Deploy GrowthExperiments on Serbian Wikipedia (T241181)]] (duration: 01m 06s)
* 11:28 tgr@deploy1001: Synchronized dblists/: SWAT: [[gerrit:584133{{!}}Deploy GrowthExperiments on Serbian Wikipedia (T241181)]] (duration: 01m 17s)
* 11:05 XioNoX: push urpf log only to codfw - [[phab:T244147|T244147]]
* 10:39 jbond42: restarting idp.wikimedia.org
* 10:14 marostegui: Deploy schema change on db1078
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P10940 and previous config saved to /var/cache/conftool/dbconfig/20200408-101431-marostegui.json
* 09:30 jynus: stopping and removing db1095:s8 instance
* 09:20 godog: upgrade grafana on cloudmetrics hosts - [[phab:T244208|T244208]]
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change', diff saved to https://phabricator.wikimedia.org/P10939 and previous config saved to /var/cache/conftool/dbconfig/20200408-091728-marostegui.json
* 09:11 gehel: setting weight=10 for all pooled wdqs servers in codfw - [[phab:T246343|T246343]]
* 09:10 marostegui: Reload proxies on dbproxy1018 and dbproxy1019 to depool labsdb1011 - [[phab:T249188|T249188]] [[phab:T248592|T248592]]
* 09:07 gehel: pooling wdqs200[78] - new servers ready to go! - [[phab:T246343|T246343]]
* 08:46 marostegui: Rename wb_terms and recreate views on labsdb1009-labsdb1011 - [[phab:T248592|T248592]] [[phab:T248086|T248086]]
* 08:39 godog: upgrade grafana on grafana1002 - [[phab:T244208|T244208]]
* 08:17 _joe_: switching parsoid to envoy (take 2) in eqiad
* 07:23 marostegui: Deploy schema change on db1075
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P10937 and previous config saved to /var/cache/conftool/dbconfig/20200408-072331-marostegui.json
* 06:31 marostegui: Deploy schema change on db1095:3313
* 06:11 marostegui: Stop haproxy on dbproxy1011 - [[phab:T231520|T231520]]
* 05:44 vgutierrez: rolling upgrade ATS to 8.0.6-1wm6 in cp[5006,5012,3065,3064,2042,2041,1090,1089]
* 05:34 marostegui: Deploy schema change on dbstore1004:3313
* 05:33 _joe_: repooling wtp1025, with envoy and logging any error above 404 [[phab:T249535|T249535]]
* 04:36 vgutierrez: rolling restart of ats-tls - [[phab:T249335|T249335]]


== 2020-04-07 ==
== 2023-05-26 ==
* 20:39 andrewbogott: correction: briefly downtiming ldap-eqiad-replica0 and ldap-eqiad-replica1.  I'm trying to investigate a possible split-brain so going to turn ldap off on one, and then the other, to see if behavior changes
* 23:48 tzatziki: removing 2 files for legal compliance
* 20:37 andrewbogott: briefly downtiming serpens and seaborgium. I'm trying to investigate a possible split-brain so going to turn ldap off on one, and then the other, to see if behavior changes
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:34 hoo: (Take 3) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:17 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 20:47 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 20:09 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.27 (duration: 60m 34s)
* 20:47 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 20:08 hoo: (Take 2) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:45 hoo: Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:13 XioNoX: push pfw firewall rules - [[phab:T249650|T249650]]
* 19:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 19:08 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.27
* 19:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 18:48 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.24 (duration: 12m 44s)
* 19:15 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 17:56 herron: increasing codfw.mediawiki.job.cirrusSearchElasticaWrite to 3 partitions [[phab:T240702|T240702]]
* 19:15 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 17:55 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) retry (duration: 01m 02s)
* 18:26 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 17:54 addshore: last sync stuck on sync-masters
* 17:38 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 10s)
* 17:54 addshore@deploy1001: sync-file aborted: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) (duration: 01m 16s)
* 17:31 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 17:49 ppchelko@deploy1001: Started restart [cpjobqueue/deploy@83c93d1]: Try to make it notice new partitions [[phab:T240702|T240702]]
* 16:37 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard2003.codfw.wmnet with OS bookworm
* 17:40 herron: increasing eqiad.mediawiki.job.cirrusSearchElasticaWrite to 3 partitions [[phab:T240702|T240702]]
* 16:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard1003.eqiad.wmnet with OS bookworm
* 16:24 longma: 1.35.0-wmf.27 was branched at {{Gerrit|e76ac29cd9c57bed4097ec8a4ea8311fb55fd967}} for [[phab:T247774|T247774]]
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:16 hashar: restarting CI jenkins
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 15:53 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 15:21 moritzm: installing idp-test2001
* 15:50 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 15:20 XioNoX: enable uRPF loose mode (log only) on cr4-ulsfo - [[phab:T244147|T244147]]
* 15:41 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
* 15:17 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (12/14.5h) (duration: 01m 00s)
* 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:10 ema: cp3052: stop purged, start vhtcpd [[phab:T249583|T249583]] [[phab:T241232|T241232]]
* 15:40 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
* 15:00 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:56 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (10/14.5h) (duration: 00m 55s)
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:52 jeh: cloudvirt2003-dev: downtime in icinga and reboot to enable BIOS virtualization support [[phab:T249453|T249453]]
* 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 14:38 ema: cp3052: stop vhtcpd, start purged [[phab:T249583|T249583]]
* 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 14:35 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (8/14.5h) (duration: 00m 58s)
* 15:31 nskaggs@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
* 14:25 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (4/14.5h) (duration: 00m 58s)
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:15 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (2/14.5h) (duration: 00m 58s)
* 15:08 nskaggs@cumin1001: START - Cookbook sre.wikireplicas.update-views
* 14:08 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) take 2 (duration: 00m 57s)
* 14:26 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=videoscaler,dc=eqiad,name=parse.*
* 13:57 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: REVERT [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 58s)
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=parse.*
* 13:55 addshore@deploy1001: sync-file aborted: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 29s)
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 13:17 vgutierrez: restart ats-tls on cp3056 - [[phab:T249335|T249335]]
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 12:59 vgutierrez: restart ats-tls on cp3052- [[phab:T249335|T249335]]
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1003.eqiad.wmnet
* 12:50 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-6.list > [[phab:T249596|T249596]]-6.out # [[phab:T249565|T249565]]
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 12:43 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-5.list > [[phab:T249596|T249596]]-5.out # [[phab:T249565|T249565]]
* 14:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 12:42 vgutierrez: restart ats-tls on cp3058 - [[phab:T249335|T249335]]
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard1003.eqiad.wmnet on all recursors
* 12:25 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:06 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard1003.eqiad.wmnet on all recursors
* 12:06 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-4.list > [[phab:T249596|T249596]]-4.out # [[phab:T249565|T249565]] [[phab:T249596|T249596]]
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'repool db1126', diff saved to https://phabricator.wikimedia.org/P10932 and previous config saved to /var/cache/conftool/dbconfig/20200407-115228-marostegui.json
* 14:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'depool db1126', diff saved to https://phabricator.wikimedia.org/P10931 and previous config saved to /var/cache/conftool/dbconfig/20200407-115154-marostegui.json
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2003.codfw.wmnet
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092, db1111, db1099:3318 after table rename', diff saved to https://phabricator.wikimedia.org/P10930 and previous config saved to /var/cache/conftool/dbconfig/20200407-115058-marostegui.json
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 11:50 jynus: renaming wb_items_per_site_recovered to wb_items_per_site on s8
* 14:03 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 11:45 jynus: stopping s8 replication on db1116:3318, db1095:3318, db2079
* 14:02 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092, db1111, db1099:3318 for table rename', diff saved to https://phabricator.wikimedia.org/P10929 and previous config saved to /var/cache/conftool/dbconfig/20200407-114258-marostegui.json
* 14:02 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetboard1003.eqiad.wmnet
* 11:36 Amir1: stopped the rebuilt script ([[phab:T249565|T249565]])
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard2003.codfw.wmnet on all recursors
* 11:34 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: cleanup [[phab:T203888|T203888]], Remove old unused RejectParserCacheValue hook (duration: 00m 59s)
* 14:02 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard2003.codfw.wmnet on all recursors
* 11:09 marostegui: Deploy schema change on s3 codfw
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:07 jynus: starting recovery on all s8 hosts
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 10:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:01 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 10:41 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php: [[phab:T249565|T249565]] [[phab:T249596|T249596]] Wikibase rebuildItemsPerSite.php script that allows lists of ids (duration: 01m 00s)
* 13:58 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 10:27 jynus: starting recovery on db1099:3318
* 13:58 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetboard2003.codfw.wmnet
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 after schema change', diff saved to https://phabricator.wikimedia.org/P10927 and previous config saved to /var/cache/conftool/dbconfig/20200407-095852-marostegui.json
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 09:49 volans@deploy1001: Finished deploy [homer/deploy@887544c]: Release v0.2.0 (take 2) (duration: 00m 26s)
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:49 volans@deploy1001: Started deploy [homer/deploy@887544c]: Release v0.2.0 (take 2)
* 13:56 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 09:38 marostegui: Deploy schema change on db1119
* 13:56 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P10926 and previous config saved to /var/cache/conftool/dbconfig/20200407-093820-marostegui.json
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb1003.eqiad.wmnet
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134 after schema change', diff saved to https://phabricator.wikimedia.org/P10925 and previous config saved to /var/cache/conftool/dbconfig/20200407-093638-marostegui.json
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:31 volans@deploy1001: Finished deploy [homer/deploy@b4522ad]: Release v0.2.0 (duration: 00m 16s)
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 09:31 volans@deploy1001: Started deploy [homer/deploy@b4522ad]: Release v0.2.0
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:29 volans@deploy1001: Finished deploy [homer/deploy@ac7a818]: Inject plugins (take 3) (duration: 03m 03s)
* 13:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:26 volans@deploy1001: Started deploy [homer/deploy@ac7a818]: Inject plugins (take 3)
* 13:51 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 09:19 marostegui: Deploy schema change on db1134
* 13:46 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P10924 and previous config saved to /var/cache/conftool/dbconfig/20200407-091847-marostegui.json
* 13:46 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 09:17 volans@deploy1001: Finished deploy [homer/deploy@a03d7cd]: Inject plugins (take 2) (duration: 00m 29s)
* 13:45 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 09:17 volans@deploy1001: Started deploy [homer/deploy@a03d7cd]: Inject plugins (take 2)
* 13:45 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetdb1003.eqiad.wmnet
* 09:04 vgutierrez: testing ATS 8.0.6-1wm6 on cp4026 and cp4032
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:58 volans@deploy1001: Finished deploy [homer/deploy@a03d7cd]: Inject plugins (duration: 04m 59s)
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 08:53 volans@deploy1001: Started deploy [homer/deploy@a03d7cd]: Inject plugins
* 13:12 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 08:46 XioNoX: enable uRPF loose mode (log only) on cr3-ulsfo v4 uplinks - [[phab:T244147|T244147]]
* 13:06 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 08:44 XioNoX: enable uRPF loose mode (log only) on cr3-ulsfo v6 uplinks - [[phab:T244147|T244147]]
* 12:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
* 08:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:37 mutante: decom ganeti VM miscweb1001 (stretch) - kept backup of old racktables files and db dump in /root/racktables on miscweb1002 ([[phab:T247648|T247648]])
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 08:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 12:41 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 08:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:39 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 08:30 mutante: decom ganeti VM miscweb2001 (stretch)
* 12:21 hashar@deploy1002: Finished deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 08:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 12:21 hashar@deploy1002: Started deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]]
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after schema change', diff saved to https://phabricator.wikimedia.org/P10923 and previous config saved to /var/cache/conftool/dbconfig/20200407-082607-marostegui.json
* 11:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
* 08:17 moritzm: installing php5 security updates
* 11:35 hashar@deploy1002: Finished deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 08:06 marostegui: Deploy schema change on db1106 (this will generate lag on s1 labs)
* 11:35 hashar@deploy1002: Started deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]]
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for schema change', diff saved to https://phabricator.wikimedia.org/P10922 and previous config saved to /var/cache/conftool/dbconfig/20200407-080533-marostegui.json
* 10:54 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1080 after schema change', diff saved to https://phabricator.wikimedia.org/P10921 and previous config saved to /var/cache/conftool/dbconfig/20200407-080443-marostegui.json
* 10:54 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 07:52 _joe_: disabling puppet on mwdebug1002
* 10:38 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
* 07:47 marostegui: Failover dbproxy1011 to dbproxy1019 - [[phab:T231520|T231520]])
* 10:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
* 07:43 marostegui: Deploy schema change on db1080
* 09:54 effie: pool parse1013-parse1016 to the jobrunner cluster  - [[phab:T329366|T329366]]
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for schema change', diff saved to https://phabricator.wikimedia.org/P10920 and previous config saved to /var/cache/conftool/dbconfig/20200407-074321-marostegui.json
* 09:29 jbond: disable puppet fleet wide to deploy minor puppet change https://gerrit.wikimedia.org/r/c/operations/puppet/+/923353
* 07:41 dcausse@deploy1001: Finished deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs2002: [[phab:T249196|T249196]] (duration: 01m 28s)
* 09:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1016.eqiad.wmnet with OS buster
* 07:40 dcausse@deploy1001: Started deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs2002: [[phab:T249196|T249196]]
* 09:26 effie: parse1013-parse1016 have neen depooled and removed from the parsoid-php service - [[phab:T329366|T329366]]
* 07:39 _joe_: depooling wtp1025, used for debugging
* 09:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1014.eqiad.wmnet with OS buster
* 07:31 vgutierrez: enable parent proxies in ats-tls - [[phab:T249335|T249335]]
* 09:24 jnuche@deploy1002: Installation of scap version "4.52.3" completed for 596 hosts
* 07:19 jynus: restarting s3 on db1095
* 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1013.eqiad.wmnet with OS buster
* 07:02 moritzm: updating linux-image-4.9.0-11-amd64 where applicable
* 09:23 jnuche@deploy1002: Installing scap version "4.52.3" for 596 hosts
* 06:55 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 06:53 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 09:13 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 06:52 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 09:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parse1015.eqiad.wmnet with OS buster
* 06:37 moritzm: installing ruby2.1 security updates
* 08:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 06:32 jynus: stopping slave (s3) on db1095
* 08:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 05:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:586488{{!}}Fix database name for repo in testwikidata (T249533)]], take II (duration: 00m 58s)
* 08:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 05:37 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:586488{{!}}Fix database name for repo in testwikidata (T249533)]] (duration: 01m 00s)
* 08:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 05:26 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 01:08 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/maintenance/: [[phab:T157651|T157651]] Remove sql.php from maintenance/ (duration: 00m 58s)
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 01:06 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/autoload.php: [[phab:T157651|T157651]] Remove sql.php from autoloader (duration: 00m 58s)
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 01:05 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: [[phab:T208425|T208425]] [[phab:T249565|T249565]] Follow-up {{Gerrit|a956c655}}: Only avoid dropping wb_items_per_site so prod can be merged (duration: 00m 58s)
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 00:01 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table [[phab:T208425|T208425]] [[phab:T249565|T249565]] cache bust (duration: 01m 01s)
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1016.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1015.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1014.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1013.eqiad.wmnet with OS buster
* 08:10 jiji@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse101[3-6].eqiad.wmnet
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48591 and previous config saved to /var/cache/conftool/dbconfig/20230526-075903-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48590 and previous config saved to /var/cache/conftool/dbconfig/20230526-075809-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48589 and previous config saved to /var/cache/conftool/dbconfig/20230526-074358-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48588 and previous config saved to /var/cache/conftool/dbconfig/20230526-074304-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48587 and previous config saved to /var/cache/conftool/dbconfig/20230526-072854-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48586 and previous config saved to /var/cache/conftool/dbconfig/20230526-072759-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48585 and previous config saved to /var/cache/conftool/dbconfig/20230526-071349-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48584 and previous config saved to /var/cache/conftool/dbconfig/20230526-071255-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48583 and previous config saved to /var/cache/conftool/dbconfig/20230526-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48582 and previous config saved to /var/cache/conftool/dbconfig/20230526-065750-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48581 and previous config saved to /var/cache/conftool/dbconfig/20230526-064340-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48580 and previous config saved to /var/cache/conftool/dbconfig/20230526-064245-root.json
* 06:42 elukey: `apt-get clean` on stat1008 to clean up some space in the root partition
* 06:36 elukey: `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48579 and previous config saved to /var/cache/conftool/dbconfig/20230526-062835-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48578 and previous config saved to /var/cache/conftool/dbconfig/20230526-062741-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48577 and previous config saved to /var/cache/conftool/dbconfig/20230526-061330-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48576 and previous config saved to /var/cache/conftool/dbconfig/20230526-061236-root.json
* 03:51 fab@deploy1002: Finished deploy [airflow-dags/research@77cf676]: (no justification provided) (duration: 00m 17s)
* 03:51 fab@deploy1002: Started deploy [airflow-dags/research@77cf676]: (no justification provided)


== 2020-04-06 ==
== 2023-05-25 ==
* 23:59 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table [[phab:T208425|T208425]] [[phab:T249565|T249565]] (duration: 00m 59s)
* 22:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] (duration: 09m 14s)
* 23:31 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki-staging/php-1.35.0-wmf.26$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki
* 22:07 zabe@deploy1002: zabe and ladsgroup: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 23:26 Amir1: created wb_items_per_site
* 22:05 zabe@deploy1002: Started scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]]
* 19:05 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:26 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) (duration: 00m 08s)
* 19:03 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:25 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@77cf676]: (no justification provided)
* 19:00 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 20:47 TheresNoTime: close UTC late backport
* 18:58 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:47 samtar@deploy1002: Finished scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] (duration: 08m 34s)
* 18:57 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 20:40 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:51 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:38 samtar@deploy1002: Started scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]]
* 18:42 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:32 samtar@deploy1002: Finished scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] (duration: 10m 58s)
* 18:22 Urbanecm: Morning SWAT done
* 20:22 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|335a924}}: Enable Local upload on azbwiki ([[phab:T248971|T248971]]; take II) (duration: 00m 58s)
* 20:21 samtar@deploy1002: Started scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]]
* 18:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|335a924}}: Enable Local upload on azbwiki ([[phab:T248971|T248971]]) (duration: 00m 59s)
* 20:13 samtar@deploy1002: Finished scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] (duration: 08m 31s)
* 16:54 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:06 samtar@deploy1002: samtar and daimona: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 16:52 _joe_: parsoid migrated to use envoy for TLS termination
* 20:05 samtar@deploy1002: Started scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]]
* 16:24 _joe_: switching parsoid-php to envoy for TLS termination
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:45 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Label blacklist updates ([[phab:T249285|T249285]]) (duration: 00m 58s)
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 15:36 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 19:31 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 15:04 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 19:29 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 14:59 addshore: deploy slot done
* 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48575 and previous config saved to /var/cache/conftool/dbconfig/20230525-190946-root.json
* 14:55 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration [[phab:T248664|T248664]] (cache bust) (duration: 00m 57s)
* 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48574 and previous config saved to /var/cache/conftool/dbconfig/20230525-190859-root.json
* 14:54 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration [[phab:T248664|T248664]] (duration: 00m 57s)
* 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48573 and previous config saved to /var/cache/conftool/dbconfig/20230525-185441-root.json
* 14:50 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix [[phab:T248664|T248664]] (cache bust) (duration: 00m 57s)
* 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48572 and previous config saved to /var/cache/conftool/dbconfig/20230525-185354-root.json
* 14:49 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix [[phab:T248664|T248664]] (duration: 00m 58s)
* 18:43 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@6b27584]: (no justification provided) (duration: 00m 19s)
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10912 and previous config saved to /var/cache/conftool/dbconfig/20200406-144220-marostegui.json
* 18:43 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@6b27584]: (no justification provided)
* 14:41 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config [[phab:T248664|T248664]] (cache bust) (duration: 00m 58s)
* 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48571 and previous config saved to /var/cache/conftool/dbconfig/20230525-183937-root.json
* 14:40 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config [[phab:T248664|T248664]] (duration: 00m 59s)
* 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48570 and previous config saved to /var/cache/conftool/dbconfig/20230525-183849-root.json
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10911 and previous config saved to /var/cache/conftool/dbconfig/20200406-143755-marostegui.json
* 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48568 and previous config saved to /var/cache/conftool/dbconfig/20230525-182432-root.json
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10910 and previous config saved to /var/cache/conftool/dbconfig/20200406-143042-marostegui.json
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48567 and previous config saved to /var/cache/conftool/dbconfig/20230525-182345-root.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10909 and previous config saved to /var/cache/conftool/dbconfig/20200406-142607-marostegui.json
* 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48566 and previous config saved to /var/cache/conftool/dbconfig/20230525-180927-root.json
* 14:24 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki [[phab:T248664|T248664]] (cachebust) (duration: 00m 58s)
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48565 and previous config saved to /var/cache/conftool/dbconfig/20230525-180840-root.json
* 14:23 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki [[phab:T248664|T248664]] (duration: 00m 59s)
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48564 and previous config saved to /var/cache/conftool/dbconfig/20230525-175423-root.json
* 14:09 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48563 and previous config saved to /var/cache/conftool/dbconfig/20230525-175335-root.json
* 14:07 elukey@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48562 and previous config saved to /var/cache/conftool/dbconfig/20230525-173918-root.json
* 14:07 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48561 and previous config saved to /var/cache/conftool/dbconfig/20230525-173831-root.json
* 13:47 sukhe: upload cescout 0.1.1-1 to apt.wm.o (buster) - [[phab:T247273|T247273]]
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:26 elukey: reboot stat1008 as test to verify ROCm 3.3 upgrades
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 13:22 elukey: stat1008 upgraded to ROCm 3.3 (enables Tensorflow 2.x)
* 17:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 13:05 ema: cache: upgrade varnish to 5.1.3-1wm13, begin rolling varnish-fe restarts [[phab:T249344|T249344]]
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48559 and previous config saved to /var/cache/conftool/dbconfig/20230525-172413-root.json
* 13:03 marostegui: Deploy schema change on db1118
* 17:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:03 jbond42: updating gnutls on buster
* 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48558 and previous config saved to /var/cache/conftool/dbconfig/20230525-172326-root.json
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for schema change', diff saved to https://phabricator.wikimedia.org/P10906 and previous config saved to /var/cache/conftool/dbconfig/20200406-130320-marostegui.json
* 17:15 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 after schema change', diff saved to https://phabricator.wikimedia.org/P10905 and previous config saved to /var/cache/conftool/dbconfig/20200406-130255-marostegui.json
* 17:14 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 12:59 Urbanecm: Creation of grwikimedia is done ([[phab:T245911|T245911]])
* 17:14 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 12:59 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 22s)
* 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 12:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 17:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 12:54 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 12:53 marostegui: Deploy schema change on db1107
* 17:09 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 for schema change', diff saved to https://phabricator.wikimedia.org/P10904 and previous config saved to /var/cache/conftool/dbconfig/20200406-125308-marostegui.json
* 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 12:52 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 17:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P10903 and previous config saved to /var/cache/conftool/dbconfig/20200406-125222-marostegui.json
* 17:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 12:46 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: {{Gerrit|77b9ae9}}: Create grwikimedia
* 17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 12:44 urbanecm@deploy1001: Synchronized dblists/: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 59s)
* 17:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 12:37 XioNoX: Update eqiad analytics filters with new APT IPs
* 16:39 topranks: adding outbound shaper config on eqsin to codfw transport cct ([[phab:T328313|T328313]])
* 12:27 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48557 and previous config saved to /var/cache/conftool/dbconfig/20230525-163657-ladsgroup.json
* 12:21 marostegui: Deploy schema change on db1089
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48556 and previous config saved to /var/cache/conftool/dbconfig/20230525-162151-ladsgroup.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for schema change', diff saved to https://phabricator.wikimedia.org/P10902 and previous config saved to /var/cache/conftool/dbconfig/20200406-122123-marostegui.json
* 16:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10901 and previous config saved to /var/cache/conftool/dbconfig/20200406-122058-marostegui.json
* 16:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 12:14 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:14 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 12:08 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:14 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 12:04 godog: test grafana 6.7.2 upgrade on grafana2001 - [[phab:T244208|T244208]]
* 16:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* 11:57 awight: EU swat complete
* 16:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* {{safesubst:SAL entry|1=11:53 awight@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TwoColConflict: SWAT: [[gerrit:586309{{!}}Backport talk page and EventLogging changes (T248243, T249404) (duration: 00m 59s)}}
* 16:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 16:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 11:48 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48555 and previous config saved to /var/cache/conftool/dbconfig/20230525-160645-ladsgroup.json
* 11:48 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:586325{{!}}Create account creator and rollback groups on yowiki (T249487)]] (duration: 00m 59s)
* 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
* 11:32 awight@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/ContentTranslation: SWAT: [[gerrit:586311{{!}}Avoid failure on restoring draft with no categories (T249400)]] (duration: 01m 02s)
* 15:57 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 11:25 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: double-syncing (duration: 00m 58s)
* 15:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 11:24 marostegui: Deploy schema change on db1105:3311
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48553 and previous config saved to /var/cache/conftool/dbconfig/20230525-155139-ladsgroup.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10900 and previous config saved to /var/cache/conftool/dbconfig/20200406-112417-marostegui.json
* 15:49 dancy@deploy1002: Finished deploy [integration/docroot@dac2b70]: Updated Scap URLs (duration: 00m 07s)
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10899 and previous config saved to /var/cache/conftool/dbconfig/20200406-112123-marostegui.json
* 15:49 dancy@deploy1002: Started deploy [integration/docroot@dac2b70]: Updated Scap URLs
* 11:18 elukey: import AMD ROCm 3.3 packages in buster-wikimedia (component thirdparty/rocm33) - [[phab:T247082|T247082]]
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154927-ladsgroup.json
* 11:17 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:580394{{!}}cirrus: Increase commonswiki near match weight (T245642)]] (duration: 00m 59s)
* 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 11:11 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:585779{{!}} Whitelist X-Wikimedia-Debug header for cross-wiki API requests (T249107)]] (duration: 00m 59s)
* 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:586305{{!}} Bumping portals to master (563985)]] (duration: 00m 58s)
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154906-ladsgroup.json
* 10:50 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:586305{{!}} Bumping portals to master (563985)]] (duration: 01m 12s)
* 15:44 dancy: dancy@deploy1002 Updated scap URLs on doc.wikimedia.org
* 09:50 XioNoX: push pfw firewall policies - [[phab:T249267|T249267]]
* 15:43 dancy@deploy1002: Finished deploy [integration/docroot@78e6f40]: (no justification provided) (duration: 00m 10s)
* 09:40 marostegui: Deploy schema change on db1099:3311
* 15:43 dancy@deploy1002: Started deploy [integration/docroot@78e6f40]: (no justification provided)
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10898 and previous config saved to /var/cache/conftool/dbconfig/20200406-093944-marostegui.json
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48552 and previous config saved to /var/cache/conftool/dbconfig/20230525-153359-ladsgroup.json
* 09:11 ema: cp2027: upgrade varnish to 5.1.3-1wm13 and restart varnish-fe [[phab:T249344|T249344]]
* 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 09:08 ema: upload varnish 5.1.3-1wm13 to buster-wikimedia on apt1001.wm.org [[phab:T249344|T249344]]
* 15:33 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 08:55 ariel@deploy1001: Finished deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link (duration: 00m 09s)
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 08:55 ariel@deploy1001: Started deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link
* 15:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 08:54 elukey: bootstrap wdqs200[7,8] - [[phab:T246343|T246343]]
* 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
* 08:50 marostegui: Deploy schema change on db1139:3311
* 15:27 kartik@deploy1002: Finished scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 07m 01s)
* 08:18 _joe_: conversion of codfw api done
* 15:22 kartik@deploy1002: kartik: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:07 marostegui: Deploy schema change on dbstore1003:3311
* 15:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 07:54 vgutierrez: rolling restart of ats-tls to disable wmf-analytics log - [[phab:T249335|T249335]] [[phab:T237993|T237993]]
* 15:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 07:50 dcausse: search index: deleting stale index wikidatawiki_content_1585224806 on cloudelastic:9243
* 15:20 kartik@deploy1002: Started scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 07:49 _joe_: eqiad API migrated to envoy for local TLS termination, now starting codfw
* 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48551 and previous config saved to /var/cache/conftool/dbconfig/20230525-151853-ladsgroup.json
* 07:35 elukey: restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 as attempt to fix heavy GC runs (old gen) - [[phab:T231517|T231517]]
* 15:18 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 68m 07s)
* 07:35 marostegui: Rename wb_terms on eqiad excluding labsdb1009, labdb1010, labsdb1011 - [[phab:T248086|T248086]]
* 15:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
* 07:06 marostegui: Rename wb_terms on codfw - [[phab:T248086|T248086]]
* 15:10 topranks: Migrating cr1-eqiad downlink to row E/F from lsw1-e1-eqiad et-0/0/48 to ssw1-e1-eqiad et-0/0/31
* 06:45 XioNoX: delete BGP to AS25074 in amsix
* 15:10 mutante: gerrit-replica.wikimedia.org - gerrit2002 - reimaging - scheduled maintenance
* 06:36 _joe_: converting the api servers to envoy for TLS in eqiad
* 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 06:30 marostegui: Upgrade dbproxy1019 - [[phab:T231520|T231520]]
* 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 06:18 marostegui: Deploy schema change on s1 codfw master, this will generate lag on codfw
* 15:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 05:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48550 and previous config saved to /var/cache/conftool/dbconfig/20230525-150347-ladsgroup.json
* 05:50 vgutierrez: ats-tls restart in cp3056, cp3058 and cp3062 - [[phab:T249335|T249335]]
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48549 and previous config saved to /var/cache/conftool/dbconfig/20230525-145857-ladsgroup.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P10897 and previous config saved to /var/cache/conftool/dbconfig/20200406-054559-marostegui.json
* 14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 05:18 marostegui: Deploy schema change on db1079 (this will generate lag on s7 labs)
* 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P10896 and previous config saved to /var/cache/conftool/dbconfig/20200406-051744-marostegui.json
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48548 and previous config saved to /var/cache/conftool/dbconfig/20230525-145836-ladsgroup.json
* 05:16 vgutierrez: Enable inbound TLSv1.3 in upload@eqiad - [[phab:T170567|T170567]]
* 14:54 marostegui: Wikireplicas are lagging behind for the following sections: s1, s2, s5, s7 [[phab:T337446|T337446]]
* 05:16 vgutierrez: Enable TLS Session Tickets on eqiad - [[phab:T245616|T245616]]
* 14:54 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 05:03 vgutierrez: ats-tls restart in cp1075, cp1081 and cp1087 - [[phab:T249335|T249335]]
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48547 and previous config saved to /var/cache/conftool/dbconfig/20230525-144330-ladsgroup.json
* 14:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
* 14:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1026']
* 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1027']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1025']
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1024']
* 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48546 and previous config saved to /var/cache/conftool/dbconfig/20230525-142824-ladsgroup.json
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1025']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1024']
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
* 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver
* 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
* 14:20 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 14:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=parsoid-php,dc=eqiad
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48545 and previous config saved to /var/cache/conftool/dbconfig/20230525-141318-ladsgroup.json
* 14:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:11 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:10 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48544 and previous config saved to /var/cache/conftool/dbconfig/20230525-140822-ladsgroup.json
* 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 15m 56s)
* 13:53 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:52 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 13:46 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]] (duration: 07m 42s)
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:38 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]]
* 13:28 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] (duration: 09m 06s)
* 13:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:20 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:19 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]]
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s5, s2, s7', diff saved to https://phabricator.wikimedia.org/P48538 and previous config saved to /var/cache/conftool/dbconfig/20230525-121012-root.json
* 11:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48537 and previous config saved to /var/cache/conftool/dbconfig/20230525-113914-root.json
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48536 and previous config saved to /var/cache/conftool/dbconfig/20230525-112409-root.json
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:21 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:20 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:15 jbond: update udplog on mwlog server
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48535 and previous config saved to /var/cache/conftool/dbconfig/20230525-110948-root.json
* 11:09 jbond: upload udplog_1.10_amd64.deb
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48534 and previous config saved to /var/cache/conftool/dbconfig/20230525-110905-root.json
* 11:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 11:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 10:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48533 and previous config saved to /var/cache/conftool/dbconfig/20230525-105443-root.json
* 10:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
* 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
* 10:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48532 and previous config saved to /var/cache/conftool/dbconfig/20230525-105400-root.json
* 10:53 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
* 10:52 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:48 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2005-dev.wikimedia.org
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48531 and previous config saved to /var/cache/conftool/dbconfig/20230525-103939-root.json
* 10:39 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48530 and previous config saved to /var/cache/conftool/dbconfig/20230525-103855-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48529 and previous config saved to /var/cache/conftool/dbconfig/20230525-103445-root.json
* 10:32 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 10:24 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2005-dev.wikimedia.org
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48528 and previous config saved to /var/cache/conftool/dbconfig/20230525-102434-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48527 and previous config saved to /var/cache/conftool/dbconfig/20230525-102351-root.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48526 and previous config saved to /var/cache/conftool/dbconfig/20230525-101940-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48525 and previous config saved to /var/cache/conftool/dbconfig/20230525-100927-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48524 and previous config saved to /var/cache/conftool/dbconfig/20230525-100846-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48523 and previous config saved to /var/cache/conftool/dbconfig/20230525-100436-root.json
* 10:00 kart_: Updated cxserver to 2023-05-25-093623-production (config: language pairs transform fix + [[phab:T331201|T331201]])
* 09:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 09:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48522 and previous config saved to /var/cache/conftool/dbconfig/20230525-095423-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48521 and previous config saved to /var/cache/conftool/dbconfig/20230525-095341-root.json
* 09:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 09:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48520 and previous config saved to /var/cache/conftool/dbconfig/20230525-094931-root.json
* 09:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 09:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48519 and previous config saved to /var/cache/conftool/dbconfig/20230525-093918-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48518 and previous config saved to /var/cache/conftool/dbconfig/20230525-093426-root.json
* 09:32 apergos: running from dumpsdata1004 via ariel login screen session, as root, rsync with bwlimit 100000  to dumpsdata1006, copying all public xml dumps data
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48517 and previous config saved to /var/cache/conftool/dbconfig/20230525-092413-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48516 and previous config saved to /var/cache/conftool/dbconfig/20230525-091922-root.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2179', diff saved to https://phabricator.wikimedia.org/P48515 and previous config saved to /var/cache/conftool/dbconfig/20230525-091132-root.json
* 09:10 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48514 and previous config saved to /var/cache/conftool/dbconfig/20230525-090417-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48513 and previous config saved to /var/cache/conftool/dbconfig/20230525-084912-root.json
* 08:32 elukey: revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - [[phab:T337248|T337248]]
* 07:52 matthiasmullie: UTC morning backports done
* 07:51 mlitn@deploy1002: Finished scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] (duration: 16m 12s)
* 07:37 mlitn@deploy1002: mlitn: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:35 mlitn@deploy1002: Started scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]]
* 07:18 mlitn@deploy1002: Finished scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] (duration: 14m 02s)
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P48511 and previous config saved to /var/cache/conftool/dbconfig/20230525-071719-root.json
* 07:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 07:06 mlitn@deploy1002: mlitn: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:04 mlitn@deploy1002: Started scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]]
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1196', diff saved to https://phabricator.wikimedia.org/P48509 and previous config saved to /var/cache/conftool/dbconfig/20230525-064418-root.json
* 06:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P48506 and previous config saved to /var/cache/conftool/dbconfig/20230525-055734-root.json
* 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P48504 and previous config saved to /var/cache/conftool/dbconfig/20230525-055236-root.json
* 05:48 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 05:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110', diff saved to https://phabricator.wikimedia.org/P48503 and previous config saved to /var/cache/conftool/dbconfig/20230525-051923-root.json
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}


== 2020-04-03 ==
== 2023-05-24 ==
* 21:17 andrewbogott: ugpraded wikitech-static to 1.34.1
* 21:18 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] (duration: 09m 40s)
* 17:58 mutante: rsync home dirs from install1002 to apt1001:/srv/home_install1002...
* 21:10 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:43 ema: cp3061: restart varnish-fe [[phab:T249344|T249344]]
* 21:08 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]]
* 15:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:55 samtar@deploy1002: Finished scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 15s)
* 15:19 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:48 samtar@deploy1002: samtar: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 15:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:47 samtar@deploy1002: Started scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 15:18 ema: cp3057: restart varnish-fe [[phab:T249344|T249344]]
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 31s)
* 14:37 hashar: Restarting Jenkins for a CSP parameter [[phab:T245658|T245658]]
* 20:18 samtar@deploy1002: samtar: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:07 vgutierrez: restart ats-tls on cp1087 - [[phab:T249335|T249335]]
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10882 and previous config saved to /var/cache/conftool/dbconfig/20200403-140132-marostegui.json
* 20:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:55 vgutierrez: restart ats-tls on cp1075 and cp1081 - [[phab:T249335|T249335]]
* 20:08 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:49 marostegui: Deploy schema change on db1090:3317
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10881 and previous config saved to /var/cache/conftool/dbconfig/20200403-124908-marostegui.json
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P10880 and previous config saved to /var/cache/conftool/dbconfig/20200403-124827-marostegui.json
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing [[phab:T249196|T249196]] (duration: 00m 43s)
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing [[phab:T249196|T249196]]
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:27 marostegui: Deploy schema change on db1136
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P10879 and previous config saved to /var/cache/conftool/dbconfig/20200403-122716-marostegui.json
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P10878 and previous config saved to /var/cache/conftool/dbconfig/20200403-122259-marostegui.json
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:00 marostegui: Deploy schema change on db1094
* 19:12 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P10877 and previous config saved to /var/cache/conftool/dbconfig/20200403-115959-marostegui.json
* 19:06 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]]
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20200403-115854-marostegui.json
* 18:55 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 11:40 marostegui: Deploy schema change on db1098:3317
* 18:49 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10875 and previous config saved to /var/cache/conftool/dbconfig/20200403-114004-marostegui.json
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10874 and previous config saved to /var/cache/conftool/dbconfig/20200403-113717-marostegui.json
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 10:38 marostegui: Deploy schema change on db1101:3317
* 18:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 10:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|861b267}}: Enable cswiki anniversary logo ([[phab:T249173|T249173]]) (duration: 01m 02s)
* 18:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10872 and previous config saved to /var/cache/conftool/dbconfig/20200403-103746-marostegui.json
* 18:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:32 marostegui: Deploy schema on db1116:3317
* 17:22 ejegg: civicrm upgraded from {{Gerrit|4251dfa1}} to {{Gerrit|b8cab6f6}}
* 08:43 marostegui: Deploy schema change on dbstore1003:3317
* 16:54 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance (duration: 00m 09s)
* 07:57 marostegui: Deploy schema change on s7 codfw master, this will generate lag on codfw
* 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance
* 06:55 XioNoX: add fastnetmon 1.1.4 to buster-wikimedia - [[phab:T240658|T240658]]
* 16:05 elukey: move kafka mirror on kafka main brokers to PKI - [[phab:T337248|T337248]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 after schema change', diff saved to https://phabricator.wikimedia.org/P10870 and previous config saved to /var/cache/conftool/dbconfig/20200403-062529-marostegui.json
* 16:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] (duration: 08m 33s)
* 05:21 marostegui: Deploy schema change on db1126
* 15:56 elukey: move kafka mirror on kafka jumbo brokers to PKI - [[phab:T337248|T337248]]
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P10869 and previous config saved to /var/cache/conftool/dbconfig/20200403-052115-marostegui.json
* 15:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 00:42 catrope@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/FlaggedRevs/: Fix logic for determining if pending edits were null ([[phab:T249277|T249277]]) (duration: 01m 00s)
* 15:52 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]]
* 15:47 ejegg: payments-wiki upgraded from {{Gerrit|e02bc7c5}} to {{Gerrit|c2f9f8b5}}
* 15:39 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363] (duration: 01m 35s)
* 15:38 ejegg: standalone SmashPig upgraded from {{Gerrit|5460dbe2}} to {{Gerrit|db23b998}}
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363]
* 15:37 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363] (duration: 00m 04s)
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363]
* 15:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:32 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:31 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:31 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363] (duration: 06m 13s)
* 15:31 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:30 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:25 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:25 aqu@deploy1002: Started deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363]
* 15:24 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:18 aqu: analytics-refinery, about to deploy
* 15:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:30 volans@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:29 volans@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:26 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 14:26 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 14:19 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] (duration: 12m 11s)
* 14:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]] (duration: 00m 07s)
* 14:13 hashar@deploy1002: Started deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]]
* 14:08 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]]
* 14:06 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] (duration: 09m 21s)
* 13:58 urbanecm@deploy1002: matmarex and urbanecm and sgimeno: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] synced t
* 13:56 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]]
* 13:55 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]] (duration: 07m 06s)
* 13:48 urbanecm@deploy1002: Started scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]]
* 13:36 samtar@deploy1002: Finished scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] (duration: 08m 04s)
* 13:29 samtar@deploy1002: samtar and wmde-fisch: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:28 samtar@deploy1002: Started scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]]
* 13:26 samtar@deploy1002: Finished scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] (duration: 10m 15s)
* 13:17 samtar@deploy1002: samtar and dcausse: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:16 samtar@deploy1002: Started scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]]
* 13:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] (duration: 07m 53s)
* 13:07 samtar@deploy1002: herron and samtar: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:07 xSavitar: tools.codesearch Deployed https://gerrit.wikimedia.org/r/c/labs/codesearch/+/909258 and also restarted tool instances to core search backend was dead.
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]]
* 12:55 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript findBadBlobs --wiki nowiki --revisions {{Gerrit|5227369}} --mark [[phab:T337392|T337392]]` [[phab:T337392|T337392]]
* 12:47 tgr_: running changeWikiConfig.php on Growth pilot wikis for [[phab:T337348|T337348]]
* 10:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes
* 09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet
* 09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet
* 09:04 dcausse@deploy1002: Finished deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset (duration: 00m 17s)
* 09:04 dcausse@deploy1002: Started deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset
* 08:52 claime: repooling mw2248.codfw.wmnet - [[phab:T334429|T334429]]
* 08:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes
* 08:50 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
* 08:49 marostegui: Stop mariadb on db1154 (sanitarium) there will be lag on clouddb* hosts
* 08:36 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]] (duration: 07m 20s)
* 08:28 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]]
* 07:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 07:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 07:41 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 07:40 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
* 05:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
* 01:19 mutante: contint2001 - jenkins started again
* 01:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 01:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:45 mutante: short maintenance on main contint server (jenkins)
* 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint1002.wikimedia.org with reason: maintenance


== 2020-04-02 ==
== 2023-05-23 ==
* 23:53 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
* 23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done
* 23:09 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't try to grant 'oathauth-enable' to '*' (part 2) ([[phab:T248282|T248282]]) (duration: 00m 58s)
* 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 19:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Translate/specials/SpecialExportTranslations.php: [[phab:T249258|T249258]]: Revert 'Special:ExportTranslations: Disallow exporting huge groups' (duration: 00m 59s)
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 19:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]] (duration: 15m 13s)
* 23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance
* 19:35 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/MovePage.php: [[phab:T248789|T248789]] MovePage: Use correct Title when creating the null revision (duration: 00m 59s)
* 23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - [[phab:T324659|T324659]]
* 19:30 hashar: docker-pkg update on contint hosts
* 22:00 eileen: civicrm upgraded from {{Gerrit|11538e23}} to {{Gerrit|4251dfa1}}
* 19:30 hashar@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 12s)
* 21:26 ejegg: payments-wiki upgraded from {{Gerrit|a7567c6a}} to {{Gerrit|e02bc7c5}}
* 19:29 hashar@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:23 ppchelko@deploy1001: Started deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]]
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:05 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 21:02 TheresNoTime: close UTC late backport window
* 19:00 longma: promoting all to 1.35.0-wmf.26
* 21:01 samtar@deploy1002: Finished scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] (duration: 11m 47s)
* 18:39 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]] (duration: 01m 05s)
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:38 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:37 longma: rolling group1 to 1.35.0-wmf.26
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:27 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/MobileFrontend/: SWAT: {{Gerrit|4e2a092}}: EditorGateway: Fix handling of null sectionId ([[phab:T249169|T249169]]) (duration: 01m 09s)
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:22 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/VisualEditor/modules/ve-mw: SWAT: {{Gerrit|94ded03}}: Fix issues with treating section "numbers" as integers ([[phab:T248795|T248795]]; [[phab:T248968|T248968]]; [[phab:T249112|T249112]]) (duration: 01m 10s)
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}} (duration: 03m 21s)
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:45 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}}
* 20:51 samtar@deploy1002: ksarabia and samtar: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 16:53 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8] (duration: 00m 08s)
* 20:50 samtar@deploy1002: Started scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]]
* 16:53 joal@deploy1001: Started deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8]
* 20:48 samtar@deploy1002: Finished scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] (duration: 11m 20s)
* 16:49 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/actions/Action.php: [[phab:T249162|T249162]] Partially revert 'WikiPage/Article split. Rely on Article inside Action' (duration: 01m 07s)
* 20:38 samtar@deploy1002: samtar: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:44 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8] (duration: 13m 50s)
* 20:37 ejegg: civicrm upgraded from {{Gerrit|efe25c9b}} to {{Gerrit|11538e23}}
* 16:37 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:37 samtar@deploy1002: Started scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]]
* 16:34 volans@cumin1001: START - Cookbook sre.dns.netbox
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 05s)
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:33 jforrester@deploy1001: sync-file aborted: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 00m 00s)
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 01m 07s)
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:30 joal@deploy1001: Started deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8]
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:19 XioNoX: upgrade netflow4001's fastnetmon to 1.1.4 - [[phab:T240658|T240658]]
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:56 XioNoX: push new test switch config for cloudvirt2001 - [[phab:T248425|T248425]]
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:33 vgutierrez: Enable inbound TLSv1.3 in upload@codfw - [[phab:T170567|T170567]]
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:33 vgutierrez: Enable TLS Session tickets in codfw - [[phab:T245616|T245616]]
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:24 jbond42: updating bluez on ganeti and cloudvirt
* 19:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10865 and previous config saved to /var/cache/conftool/dbconfig/20200402-142338-marostegui.json
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10864 and previous config saved to /var/cache/conftool/dbconfig/20200402-141802-marostegui.json
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10863 and previous config saved to /var/cache/conftool/dbconfig/20200402-141335-marostegui.json
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10862 and previous config saved to /var/cache/conftool/dbconfig/20200402-141149-marostegui.json
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:50 marostegui: Compress wbqc_constraints on testcommonswiki and commonswiki (empty tables) - [[phab:T248967|T248967]]
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 vgutierrez: update puppet compiler facts
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:40 marostegui: Deploy schema change on db1111
* 19:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P10861 and previous config saved to /var/cache/conftool/dbconfig/20200402-133956-marostegui.json
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:32 gehel: OSM data reimport on maps2004 - [[phab:T249086|T249086]]
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:55 mutante: mw1390 - mw1399 - pooled and active but status "staged" in netbox, fixing to 'active'
* 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:52 mutante: mw1297 - is pooled and serving traffic but status "staged" in netbox. set to "active"
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after schema change', diff saved to https://phabricator.wikimedia.org/P10858 and previous config saved to /var/cache/conftool/dbconfig/20200402-114020-marostegui.json
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:06 mutante: decom planet1001 ([[phab:T248863|T248863]])
* 19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 10:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:42 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 10:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:19 marostegui: Deploy schema change on db1087, this will generate lag on s8 on wiki replicas
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P10857 and previous config saved to /var/cache/conftool/dbconfig/20200402-101920-marostegui.json
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:17 elukey: set up TLS encryption for all pmacct instances on netflow* to Kafka Jumbo
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P10856 and previous config saved to /var/cache/conftool/dbconfig/20200402-101747-marostegui.json
* 19:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 09:47 marostegui: Remove haproxy@10.64.37.14 from labsdb hosts - [[phab:T231280|T231280]] [[phab:T248944|T248944]]
* 19:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 09:44 gehel: CORRECTION: depool maps2004 for data reimport - [[phab:T249086|T249086]]
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1027
* 09:40 gehel: depool wdqs2004 for data reimport - [[phab:T249086|T249086]]
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1027
* 09:33 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 18s)
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1026
* 09:32 oblivian@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1026
* 09:28 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@4f86d77]: (no justification provided) (duration: 00m 09s)
* 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
* 09:28 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4f86d77]: (no justification provided)
* 19:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 08:51 marostegui: Deploy schema change db1104
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for schema change', diff saved to https://phabricator.wikimedia.org/P10854 and previous config saved to /var/cache/conftool/dbconfig/20200402-085057-marostegui.json
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P10853 and previous config saved to /var/cache/conftool/dbconfig/20200402-085019-marostegui.json
* 19:31 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1025
* 08:28 gehel: repooling wdqs1006 - catched up on lag
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 08:22 vgutierrez: Enable inbound TLSv1.3 in upload@esams - [[phab:T170567|T170567]]
* 19:30 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
* 08:21 vgutierrez: Enable TLS Session tickets in esams - [[phab:T245616|T245616]]
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 07:45 moritzm: bounced ferm on ms-be1040
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 07:27 marostegui: Deploy schema change on db1092
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P10850 and previous config saved to /var/cache/conftool/dbconfig/20200402-072730-marostegui.json
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10849 and previous config saved to /var/cache/conftool/dbconfig/20200402-072500-marostegui.json
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 05:49 marostegui: Deploy schema change on db1101:3318
* 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10848 and previous config saved to /var/cache/conftool/dbconfig/20200402-054931-marostegui.json
* 19:25 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
* 05:29 elukey: powercycle analytics1045 (host not responsive to ssh, weird chars showed in mgmt serial console)
* 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1022
* 19:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 19:24 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1022
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:29 inflatador: bking@cumin1001 rolling restart of codfw wdqs public hosts [[phab:T337327|T337327]]
* 18:26 ryankemper: [WDQS] [[phab:T337327|T337327]] Deployed new, hopefully-working rule after addressing previous syntax error (unescaped `"`). See `/srv/private` commit `6e2f5ab19427902994bb9d03d28277252f021474`
* 18:16 ryankemper: [WDQS] Rolled back requestctl rule
* 18:12 ryankemper: [WDQS] [[phab:T337327|T337327]] New rule in place to ban potential source of WDQS codfw outage. Rolling restart will be done in a couple minutes to [attempt to] restore service availability
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:03 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]] and [[phab:T333140|T333140]]
* 17:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
* 16:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:50 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]], part 2
* 16:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:42 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]]
* 16:41 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:31 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Rename page content change enrich error stream to match convention - [[phab:T336656|T336656]] (duration: 06m 58s)
* 16:22 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]] (duration: 36m 02s)
* 15:56 topranks: moving lvs1018 connection to rack E1 from lsw1-e1-eqiad to ssw1-e1-eqiad [[phab:T322937|T322937]]
* 15:46 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]]
* 15:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 sukhe: stop pybal on lvs1018: [[phab:T322937|T322937]]
* 15:38 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bullseye
* 15:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:20 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:20 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:19 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 15:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bullseye
* 15:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases1003.eqiad.wmnet with OS bullseye
* 15:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:00 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes
* 14:58 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:58 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:51 moritzm: removed imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org/buster-wikimedia now that the Thumbor spec tests have been upgraded to match latest patches
* 14:49 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:46 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:36 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases1003.eqiad.wmnet with OS bullseye
* 14:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kafkamon2002.codfw.wmnet
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases2003.codfw.wmnet
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 herron@cumin1001: START - Cookbook sre.dns.netbox
* 14:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases2003.codfw.wmnet
* 13:56 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2002.codfw.wmnet
* 13:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1002.eqiad.wmnet
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:54 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:50 herron@cumin1001: START - Cookbook sre.dns.netbox
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases1003.eqiad.wmnet
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:47 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1002.eqiad.wmnet
* 13:45 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:45 hoo@deploy1002: Finished scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] (duration: 12m 49s)
* 13:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 13:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 13:43 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:43 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases1003.eqiad.wmnet
* 13:33 hoo@deploy1002: hoo: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:32 hoo@deploy1002: Started scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]]
* 13:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:40 akosiaris@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 10:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 10:21 akosiaris: reboot rdb1011 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:21 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 10:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
* 10:07 akosiaris: reboot rdb2009 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 10:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
* 09:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48493 and previous config saved to /var/cache/conftool/dbconfig/20230523-095720-root.json
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
* 09:50 stevemunene: reboot an-test-master1002.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
* 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48492 and previous config saved to /var/cache/conftool/dbconfig/20230523-094216-root.json
* 09:42 stevemunene: reboot an-test-worker1003.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
* 09:34 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48491 and previous config saved to /var/cache/conftool/dbconfig/20230523-092711-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48490 and previous config saved to /var/cache/conftool/dbconfig/20230523-091207-root.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48489 and previous config saved to /var/cache/conftool/dbconfig/20230523-085702-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48488 and previous config saved to /var/cache/conftool/dbconfig/20230523-085246-root.json
* 08:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 08:44 hashar@deploy1002: Started deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48487 and previous config saved to /var/cache/conftool/dbconfig/20230523-084157-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48486 and previous config saved to /var/cache/conftool/dbconfig/20230523-083741-root.json
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1122.eqiad.wmnet
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:35 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1122.eqiad.wmnet
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48485 and previous config saved to /var/cache/conftool/dbconfig/20230523-082653-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48484 and previous config saved to /var/cache/conftool/dbconfig/20230523-082237-root.json
* 08:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] (duration: 08m 37s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48483 and previous config saved to /var/cache/conftool/dbconfig/20230523-081342-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48482 and previous config saved to /var/cache/conftool/dbconfig/20230523-081148-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48481 and previous config saved to /var/cache/conftool/dbconfig/20230523-080732-root.json
* 08:07 kartik@deploy1002: kartik: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:05 kartik@deploy1002: Started scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]]
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48480 and previous config saved to /var/cache/conftool/dbconfig/20230523-075227-root.json
* 07:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:51 hashar@deploy1002: Started deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:47 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] (duration: 07m 19s)
* 07:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:44 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:41 marostegui@deploy1002: marostegui: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:39 marostegui@deploy1002: Started scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]]
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48479 and previous config saved to /var/cache/conftool/dbconfig/20230523-073841-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48478 and previous config saved to /var/cache/conftool/dbconfig/20230523-073722-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48477 and previous config saved to /var/cache/conftool/dbconfig/20230523-073710-root.json
* 07:36 marostegui: Starting es5 eqiad failover from es1024 to es1023 [[phab:T337285|T337285]]
* 07:25 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] (duration: 07m 16s)
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48476 and previous config saved to /var/cache/conftool/dbconfig/20230523-072218-root.json
* 07:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]]
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 42s)
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48475 and previous config saved to /var/cache/conftool/dbconfig/20230523-070713-root.json
* 07:06 kartik@deploy1002: kartik: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48474 and previous config saved to /var/cache/conftool/dbconfig/20230523-070547-root.json
* 07:04 kartik@deploy1002: Started scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]]
* 07:00 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] (duration: 06m 58s)
* 06:54 marostegui@deploy1002: marostegui: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 06:53 marostegui@deploy1002: Started scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48473 and previous config saved to /var/cache/conftool/dbconfig/20230523-065042-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change es1020 weight', diff saved to https://phabricator.wikimedia.org/P48472 and previous config saved to /var/cache/conftool/dbconfig/20230523-064850-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48471 and previous config saved to /var/cache/conftool/dbconfig/20230523-064820-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48470 and previous config saved to /var/cache/conftool/dbconfig/20230523-064729-root.json
* 06:46 marostegui: Starting es4 eqiad failover from es1021 to es1020 - [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48469 and previous config saved to /var/cache/conftool/dbconfig/20230523-063836-root.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48468 and previous config saved to /var/cache/conftool/dbconfig/20230523-063538-root.json
* 06:26 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] (duration: 08m 21s)
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48467 and previous config saved to /var/cache/conftool/dbconfig/20230523-062033-root.json
* 06:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 06:18 marostegui@deploy1002: Started scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]]
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48466 and previous config saved to /var/cache/conftool/dbconfig/20230523-060528-root.json
* 06:04 kart_: cxserver: Remove Flores MT service ([[phab:T331505|T331505]])
* 06:03 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 06:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48465 and previous config saved to /var/cache/conftool/dbconfig/20230523-055024-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48464 and previous config saved to /var/cache/conftool/dbconfig/20230523-053519-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48463 and previous config saved to /var/cache/conftool/dbconfig/20230523-052014-root.json
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.8 (duration: 02m 17s)
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 49m 04s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 02:57 eileen: civicrm upgraded from {{Gerrit|3329155a}} to {{Gerrit|6642b602}}
* 02:22 eileen: civicrm upgraded from {{Gerrit|7eae24d5}} to {{Gerrit|3329155a}}


== 2020-04-01 ==
== 2023-05-22 ==
* 22:44 volker-e@deploy1001: Finished deploy [design/style-guide@4bfe647]: Deploy design/style-guide:  (duration: 00m 08s)
* 23:29 eileen: civicrm upgraded from {{Gerrit|cc9593d0}} to {{Gerrit|7eae24d5}}
* 22:43 volker-e@deploy1001: Started deploy [design/style-guide@4bfe647]: Deploy design/style-guide:
* 23:16 zabe@deploy1002: Finished scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] (duration: 06m 58s)
* 22:02 volans: forcing logrotate on netflow2001 to compress yesterday's logs
* 23:11 zabe@deploy1002: zabe: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 21:53 volans: force-rebooting ms-be1023, unresponsive - [[phab:T249174|T249174]]
* 23:09 zabe@deploy1002: Started scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]]
* 21:50 volans: stopped and restarted kafkatee-webrequest.service on netflow2001, was in a restart loop
* 21:38 sbassett: Deployed security mitigations for [[phab:T333140|T333140]] and [[phab:T336027|T336027]]
* 19:48 marxarelli: rollback of 1.35.0-wmf.26 from group1 ([[phab:T247773|T247773]]). blocked by [[phab:T249162|T249162]]
* 20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1004.eqiad.wmnet
* 19:30 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rollback 1.35.0-wmf.26 from group1
* 20:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:21 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26 (duration: 01m 06s)
* 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26
* 20:53 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 19:18 marxarelli: promoting group1 to 1.35.0-wmf.26 to group1
* 20:51 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 17:21 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqord*' commit 'enable sampling on eqord Iac15379cc'
* 20:45 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1004.eqiad.wmnet
* 16:54 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqdfw*' commit 'enable sampling on eqdfw Iac15379cc'
* 20:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1005.eqiad.wmnet
* 16:39 vgutierrez: pool cp2027 - [[phab:T248816|T248816]]
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 16:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 16:17 ariel@deploy1001: Finished deploy [dumps/dumps@21363c1]: page range prefetch fixup (duration: 00m 09s)
* 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 16:17 ariel@deploy1001: Started deploy [dumps/dumps@21363c1]: page range prefetch fixup
* 20:33 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1005.eqiad.wmnet
* 15:33 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:27 TheresNoTime: close UTC late backport window
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] (duration: 07m 47s)
* 15:31 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 20:17 samtar@deploy1002: samtar and superpes: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]]
* 15:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] (duration: 08m 22s)
* 15:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 15:27 vgutierrez: depool & decommission cp20[16,19,23,27] - [[phab:T249125|T249125]]
* 20:09 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10845 and previous config saved to /var/cache/conftool/dbconfig/20200401-152258-marostegui.json
* 20:08 samtar@deploy1002: superpes and samtar: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:11 herron: performing kafka-main rolling restarts to pick up security updates
* 20:06 samtar@deploy1002: Started scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]]
* 14:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:43 vgutierrez: depool && decommission cp[2018,2020,2022,2024-2026].codfw.wmnet - [[phab:T249115|T249115]]
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gehel: depooling wdqs1006 to allow catching up on lag
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:30 vgutierrez: pool cp2042 - [[phab:T248816|T248816]]
* 17:04 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided) (duration: 00m 17s)
* 14:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided)
* 14:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:58 XioNoX: push mgmt_junos to all L2 switches
* 14:09 XioNoX: remove AS-path prepending in esams
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2009.codfw.wmnet
* 13:47 XioNoX: remove AS-path prepending in eqsin
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
* 13:39 vgutierrez: pool cp2041 - [[phab:T248816|T248816]]
* 15:57 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
* 13:34 mutante: sodium (mirror): sudo -u mirror ftpsync to get Debian mirror updated (Icinga says it's old)
* 15:56 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2009.codfw.wmnet
* 13:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
* 13:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
* 13:17 marostegui: Deploy schema change on db1099:3318
* 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10843 and previous config saved to /var/cache/conftool/dbconfig/20200401-131719-marostegui.json
* 15:25 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
* 13:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 13:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:32 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:31 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 12:19 tgr@deploy1001: Synchronized wmf-config/config: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 06s)
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:10 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 12:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor2003.codfw.wmnet with OS bookworm
* 12:17 tgr@deploy1001: Synchronized dblists/growthexperiments.dblist: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 05s)
* 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 12:17 XioNoX: restart nfacct on netflow4001 for kafka tls tests - [[phab:T248980|T248980]]
* 12:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 12:15 vgutierrez: depool & decommission cp2013 - [[phab:T249088|T249088]]
* 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor2003.codfw.wmnet with OS bookworm
* 12:14 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 06s)
* 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor1003.eqiad.wmnet with OS bookworm
* 12:12 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585059{{!}}Enable password-reset-update on all other than Wikipedias (T245791)]] (duration: 01m 07s)
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 12:09 marostegui: Deploy schema change on db1116:3318
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P48456 and previous config saved to /var/cache/conftool/dbconfig/20230522-115936-root.json
* 12:05 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons take 2 (duration: 01m 08s)
* 11:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 12:04 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons (duration: 01m 05s)
* 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor1003.eqiad.wmnet with OS bookworm
* 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]; take II) (duration: 01m 05s)
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path [[phab:T337220|T337220]]
* 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]) (duration: 01m 07s)
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path
* 11:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable WikibaseQualityConstraints on Commons take II (duration: 01m 06s)
* 10:06 hashar@deploy1002: Finished scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] (duration: 37m 00s)
* 11:44 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable WikibaseQualityConstraints on Commons (duration: 01m 18s)
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor2003.codfw.wmnet
* 11:20 cormacparle__: created table wbqc_constraints on commonswiki
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 11:03 jbond42: install bluez update on ganeti-canary and cloudvirt/cloudcontrol-dev
* 10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 11:01 mutante: planet1001 - reinstall OS to test install_server switch, ATS switched to planet1002 earlier
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor2003.codfw.wmnet on all recursors
* 10:47 marostegui: Deploy schema change on dbstore1005:3318
* 10:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor2003.codfw.wmnet on all recursors
* 10:25 vgutierrez: pool cp2040 - [[phab:T248816|T248816]]
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:16 oblivian@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: service=canary
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 09:55 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 09:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:02 moritzm: installing updated usb.ids packages for Bullseye
* 09:37 marostegui: Deploy schema change on s8 codfw, this will generate lag on codfw
* 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:35 XioNoX: Update install servers IPs (dhcp helpers + firewall rules) - [[phab:T224576|T224576]]
* 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor2003.codfw.wmnet
* 09:34 mutante: install_servers: DHCP_relay in routers and TFTP server in DHCP server config have been switched from install1002/2002 to install1003/2003 - doing a test install, but if any issues report on [[phab:T224576|T224576]]
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor1003.eqiad.wmnet
* 09:26 marostegui: last entry was for db2093
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:26 marostegui: Downgrade mariadb package from 10.4.12-2 to 10.4.12-1
* 09:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor1003.eqiad.wmnet on all recursors
* 09:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor1003.eqiad.wmnet on all recursors
* 09:05 mutante: planet - the backend server has been switched from planet1001 (stretch) to planet1002 (buster) - [[phab:T247651|T247651]]
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:46 mutante: deneb, boron: systemctl reset-failed to clear up systemd state alerts
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 08:43 marostegui: Stop haproxy on dbproxy1010 [[phab:T248944|T248944]]
* 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 08:37 jynus: restart bacula at backup1001
* 09:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor1003.eqiad.wmnet
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 09:39 hashar@deploy1002: hashar: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:29 hashar@deploy1002: Started scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]]
* 08:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:46 marostegui: Stop mysql on db2160 (haproxy irc alerts will be generated)
* 08:28 vgutierrez: depool & decommission cp2017 - [[phab:T249084|T249084]]
* 08:28 elukey: drain Arelion link between cr1-codfw and cr3-eqsin to mitigate packet loss eqiad <-> eqsin
* 08:21 vgutierrez: pool cp2039 - [[phab:T248816|T248816]]
* 08:22 moritzm: installing systemd security updates
* 08:09 marostegui: Deploy schema change on db1138 (s4 primary master)
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48455 and previous config saved to /var/cache/conftool/dbconfig/20230522-081724-root.json
* 08:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48454 and previous config saved to /var/cache/conftool/dbconfig/20230522-080219-root.json
* 08:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:59 elukey: restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 after schema change', diff saved to https://phabricator.wikimedia.org/P10841 and previous config saved to /var/cache/conftool/dbconfig/20200401-071339-marostegui.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48453 and previous config saved to /var/cache/conftool/dbconfig/20230522-075613-root.json
* 07:12 vgutierrez: pool cp2038 - [[phab:T248816|T248816]]
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48452 and previous config saved to /var/cache/conftool/dbconfig/20230522-074715-root.json
* 06:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48451 and previous config saved to /var/cache/conftool/dbconfig/20230522-074109-root.json
* 06:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 07:37 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 06:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:32 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 06:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:32 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 06:36 vgutierrez: depool & decommission cp2012 - [[phab:T249080|T249080]]
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48450 and previous config saved to /var/cache/conftool/dbconfig/20230522-073210-root.json
* 06:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:28 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 06:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48449 and previous config saved to /var/cache/conftool/dbconfig/20230522-072604-root.json
* 05:39 marostegui: Deploy schema change on db1121 (this will create lag on s4 labs)
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48448 and previous config saved to /var/cache/conftool/dbconfig/20230522-071705-root.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for schema change', diff saved to https://phabricator.wikimedia.org/P10840 and previous config saved to /var/cache/conftool/dbconfig/20200401-053827-marostegui.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48447 and previous config saved to /var/cache/conftool/dbconfig/20230522-071333-root.json
* 00:39 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Update http and prot rel links to https, fix link to sitelist in MW Core (duration: 01m 06s)
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48446 and previous config saved to /var/cache/conftool/dbconfig/20230522-071326-root.json
* 00:12 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Add export-0.11 (duration: 01m 05s)
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48445 and previous config saved to /var/cache/conftool/dbconfig/20230522-071319-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48444 and previous config saved to /var/cache/conftool/dbconfig/20230522-071059-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48443 and previous config saved to /var/cache/conftool/dbconfig/20230522-070200-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48442 and previous config saved to /var/cache/conftool/dbconfig/20230522-065828-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48441 and previous config saved to /var/cache/conftool/dbconfig/20230522-065822-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48440 and previous config saved to /var/cache/conftool/dbconfig/20230522-065815-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48439 and previous config saved to /var/cache/conftool/dbconfig/20230522-065555-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48438 and previous config saved to /var/cache/conftool/dbconfig/20230522-064656-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48437 and previous config saved to /var/cache/conftool/dbconfig/20230522-064541-root.json
* 06:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
* 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48436 and previous config saved to /var/cache/conftool/dbconfig/20230522-064323-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48435 and previous config saved to /var/cache/conftool/dbconfig/20230522-064317-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48434 and previous config saved to /var/cache/conftool/dbconfig/20230522-064310-root.json
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1121.eqiad.wmnet
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48433 and previous config saved to /var/cache/conftool/dbconfig/20230522-064050-root.json
* 06:40 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1121.eqiad.wmnet
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48432 and previous config saved to /var/cache/conftool/dbconfig/20230522-063151-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48431 and previous config saved to /var/cache/conftool/dbconfig/20230522-062818-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48430 and previous config saved to /var/cache/conftool/dbconfig/20230522-062812-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48429 and previous config saved to /var/cache/conftool/dbconfig/20230522-062805-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48428 and previous config saved to /var/cache/conftool/dbconfig/20230522-062545-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight to es2024', diff saved to https://phabricator.wikimedia.org/P48427 and previous config saved to /var/cache/conftool/dbconfig/20230522-061947-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48426 and previous config saved to /var/cache/conftool/dbconfig/20230522-061925-root.json
* 06:17 marostegui: Starting es5 codfw failover from es2023 to es2024 - [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48425 and previous config saved to /var/cache/conftool/dbconfig/20230522-061524-root.json
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48424 and previous config saved to /var/cache/conftool/dbconfig/20230522-061314-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48423 and previous config saved to /var/cache/conftool/dbconfig/20230522-061307-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48422 and previous config saved to /var/cache/conftool/dbconfig/20230522-061300-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48421 and previous config saved to /var/cache/conftool/dbconfig/20230522-061040-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2021', diff saved to https://phabricator.wikimedia.org/P48420 and previous config saved to /var/cache/conftool/dbconfig/20230522-061033-marostegui.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48419 and previous config saved to /var/cache/conftool/dbconfig/20230522-055809-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48418 and previous config saved to /var/cache/conftool/dbconfig/20230522-055803-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48417 and previous config saved to /var/cache/conftool/dbconfig/20230522-055756-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48416 and previous config saved to /var/cache/conftool/dbconfig/20230522-055120-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48415 and previous config saved to /var/cache/conftool/dbconfig/20230522-054304-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48414 and previous config saved to /var/cache/conftool/dbconfig/20230522-054258-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48413 and previous config saved to /var/cache/conftool/dbconfig/20230522-054251-root.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48412 and previous config saved to /var/cache/conftool/dbconfig/20230522-053705-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 codfw primaryT337203', diff saved to https://phabricator.wikimedia.org/P48411 and previous config saved to /var/cache/conftool/dbconfig/20230522-053554-marostegui.json
* 05:34 marostegui: Starting es4 codfw failover from es2021 to es2020 - [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48410 and previous config saved to /var/cache/conftool/dbconfig/20230522-052938-root.json
* 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48409 and previous config saved to /var/cache/conftool/dbconfig/20230522-052800-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48408 and previous config saved to /var/cache/conftool/dbconfig/20230522-052753-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48407 and previous config saved to /var/cache/conftool/dbconfig/20230522-052746-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029, es1030, es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48406 and previous config saved to /var/cache/conftool/dbconfig/20230522-051957-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Failover es1, es2 and es3 masters for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48405 and previous config saved to /var/cache/conftool/dbconfig/20230522-051723-marostegui.json


== 2020-03-31 ==
== 2023-05-21 ==
* 22:23 marxarelli: group0 to 1.35.0-wmf.26 ([[phab:T247773|T247773]]); no rise in error rates following redeployment
* 07:45 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 22:13 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.26
* 07:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 22:07 dduvall@deploy1001: rebuilt and synchronized wikiversions files: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 07:43 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 21:54 dduvall@deploy1001: sync aborted: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]]) (duration: 07m 31s)
* 07:42 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 21:47 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 07:41 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 21:46 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/user/UserNameUtils.php: [[phab:T249045|T249045]] Use wfMessage in UserNameUtils::isUsable for now (duration: 00m 58s)
* 07:40 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 21:05 eileen: process-control config revision is {{Gerrit|f80d248113}} - (catch up dedupe now off - fyi MBeat )
* 20:59 hashar: contint1001: manually reverted /lib/systemd/system/jenkins.service
* 20:51 hashar: Restarting Jenkins for new CSP rules # [[phab:T245658|T245658]]
* 20:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rolling back 1.35.0-wmf.26 testwiki deployment following significant increase in error rate (cc [[phab:T247773|T247773]])
* 20:14 marxarelli: correction: RequestContext::getLanguage errors are for testwiki deployment, pre group0
* 20:08 marxarelli: a slew of "ErrorException from line 334 of /srv/mediawiki/php-1.35.0-wmf.26/includes/context/RequestContext.php: PHP Warning: Recursion detected in RequestContext::getLanguage" after group0 deployment (cc [[phab:T247773|T247773]])
* 20:04 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache (duration: 142m 48s)
* 19:20 ariel@deploy1001: Finished deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly (duration: 00m 04s)
* 19:20 ariel@deploy1001: Started deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly
* 18:08 ariel@deploy1001: Finished deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date (duration: 00m 05s)
* 18:07 ariel@deploy1001: Started deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date
* 17:42 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache
* 17:40 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.23 (duration: 26m 51s)
* 17:38 elukey: restart elasticsearch_6@cloudelastic-chi-eqiad.service on cloudelastic1001 to see if it recovers from a trashing/gc state - [[phab:T231517|T231517]]
* 16:30 marxarelli: 1.35.0-wmf.26 was branched at {{Gerrit|bec758b668aaa57fc259a1d0ecf3b35340d2661b}} for [[phab:T247773|T247773]]
* 16:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 00s)
* 16:15 vgutierrez: pool cp2037 - [[phab:T248816|T248816]]
* 15:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:35 mutante: decom mw1254 through mw1258 (last remaining old servers in rack D5, depooled a while ago and average response time is again under 200ms) [[phab:T247780|T247780]]
* 15:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 15:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:26 vgutierrez: depool & decommission cp2010 - [[phab:T249002|T249002]]
* 15:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245794|T245794]] Enable DiscussionTools as a beta feature on four wikis (duration: 01m 00s)
* 15:05 cdanis: cr1-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 15:01 cdanis: cr2-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 14:43 vgutierrez: pool cp2036 - [[phab:T248816|T248816]]
* 14:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw125[4-8].eqiad.wmnet
* 14:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1091 after schema change', diff saved to https://phabricator.wikimedia.org/P10834 and previous config saved to /var/cache/conftool/dbconfig/20200331-141459-marostegui.json
* 14:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
* 13:31 vgutierrez: Enable TLS Session tickets in eqsin - [[phab:T245616|T245616]]
* 13:05 XioNoX: update nat on pfw3-codfw - [[phab:T248906|T248906]]
* 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:49 _joe_: switching all appserver canaries to envoy
* 12:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:45 marostegui: Deploy schema change on db1091
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 for schema change', diff saved to https://phabricator.wikimedia.org/P10833 and previous config saved to /var/cache/conftool/dbconfig/20200331-124452-marostegui.json
* 12:34 _joe_: transitioning mw1261 to envoy
* 12:23 vgutierrez: rolling upgrade of ATS to version 8.0.6-1wm5 - [[phab:T248938|T248938]]
* 11:30 Lucas_WMDE: EU SWAT done
* 11:30 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]], take II (duration: 00m 57s)
* 11:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]] (duration: 00m 58s)
* 11:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]], take II (duration: 00m 59s)
* 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]] (duration: 01m 00s)
* 10:46 _joe_: disabled puppet on canary appservers, potentially dangerous change ahead
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084 after schema change', diff saved to https://phabricator.wikimedia.org/P10831 and previous config saved to /var/cache/conftool/dbconfig/20200331-101953-marostegui.json
* 10:03 XioNoX: add BGP to AS41327 in AMS-IX
* 09:49 XioNoX: push homer diffs to mr1-eqsin
* 09:36 XioNoX: push homer diffs to mr1-eqiad
* 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 09:05 vgutierrez: upload trafficserver 8.0.5-1wm6 to apt.wm.o (buster) - [[phab:T248938|T248938]]
* 09:00 vgutierrez: depool & decommission cp2011 - [[phab:T248950|T248950]]
* 08:44 vgutierrez: pool cp2035 - [[phab:T248816|T248816]]
* 08:31 mutante: signed puppet cert for planet1002.eqiad.wmnet
* 08:29 marostegui: Depool db1084 for schema change
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for schema change', diff saved to https://phabricator.wikimedia.org/P10829 and previous config saved to /var/cache/conftool/dbconfig/20200331-082904-marostegui.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1081 after schema change', diff saved to https://phabricator.wikimedia.org/P10828 and previous config saved to /var/cache/conftool/dbconfig/20200331-082711-marostegui.json
* 08:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 08:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:01 XioNoX: delete unused ROA for ARIN v4 prefixes - [[phab:T235886|T235886]]
* 07:49 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:17 vgutierrez: pool cp2034 - [[phab:T248816|T248816]]
* 07:16 marostegui: Deploy schema change on db1081
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 for schema change', diff saved to https://phabricator.wikimedia.org/P10827 and previous config saved to /var/cache/conftool/dbconfig/20200331-071547-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10826 and previous config saved to /var/cache/conftool/dbconfig/20200331-071401-marostegui.json
* 06:48 marostegui: Deploy schema change on db1103:3314
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10825 and previous config saved to /var/cache/conftool/dbconfig/20200331-064707-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10824 and previous config saved to /var/cache/conftool/dbconfig/20200331-064627-marostegui.json
* 05:55 marostegui: Drop nova and nova_api from m5 master (db1133) - [[phab:T248313|T248313]]
* 05:55 kart_: Updated cxserver to 2020-03-30-145349-production ([[phab:T248578|T248578]])
* 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 05:53 vgutierrez: depool && decommission cp2007 - [[phab:T248941|T248941]]
* 05:48 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:46 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 05:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:46 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 05:26 marostegui: Deploy schema change on db1097:3314
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10822 and previous config saved to /var/cache/conftool/dbconfig/20200331-051354-marostegui.json
* 00:26 eileen: civicrm revision changed from {{Gerrit|cf2e2c11c3}} to {{Gerrit|524b162174}}, config revision is {{Gerrit|708198a154}}


== 2020-03-30 ==
== 2023-05-20 ==
* 23:30 cdanis: cr3-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 18:25 effie: restart varnish cp3061
* 23:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 16:39 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1018.eqiad.wmnet
* 23:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Alphabetize wikis in each GrowthExperiments settings (duration: 00m 58s)
* 15:17 hoo@deploy1002: Finished scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] (duration: 08m 47s)
* 23:16 cdanis: cr2-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 15:10 hoo@deploy1002: hoo: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 23:08 cdanis: cdanis@cr3-knams# commit comment "sensible flow table sizes [[phab:T248394|T248394]]"
* 15:08 hoo@deploy1002: Started scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]]
* 22:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 14:41 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1018.eqiad.wmnet
* 22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Provide wmgSiteLogoIcon (duration: 00m 57s)
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgSiteLogoIcon for each project family and four special wikis (duration: 00m 58s)
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 22:50 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Set wgMobileFrontendLogo from wgLogos['icon'] if set (duration: 00m 59s)
* 09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 22:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 57s)
* 09:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 22:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Split wgLogos setting into wmgSiteLogo1x etc. (duration: 00m 59s)
* 22:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Construct wgLogos in CommonSettings so that projects can inherit values (duration: 01m 02s)
* 19:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 ejegg: updated payments listener (standalone SmashPig) from {{Gerrit|dc0c6b208b}} to {{Gerrit|d80e4c5abd}}
* 15:32 vgutierrez: pool cp2033 - [[phab:T248816|T248816]]
* 15:25 jeh: add icinga 2h downtime and soft reset iDRAC on labstore1005.mgmt.eqiad.wmnet [[phab:T247965|T247965]]
* 14:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:53 vgutierrez: depool & decommission cp2008 - [[phab:T248864|T248864]]
* 14:23 vgutierrez: pool cp2032 - [[phab:T248816|T248816]]
* 14:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:01 vgutierrez: depool & decommission cp2006 - [[phab:T248856|T248856]]
* 13:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:45 vgutierrez: pool cp2031 - [[phab:T248816|T248816]]
* 13:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 13:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 13:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:53 vgutierrez: depool & decommission cp2005 - [[phab:T248848|T248848]]
* 12:26 cdanis: cdanis@re0.cr2-codfw# set chassis fpc 5 inline-services flex-flow-sizing    cdanis@re0.cr2-codfw# commit comment "flex-flow-sizing [[phab:T248394|T248394]]"
* 12:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:21 vgutierrez: depool & decommission cp2004 - [[phab:T248824|T248824]]
* 12:03 XioNoX: delete unused ROA for ARIN v6 prefixes - [[phab:T235886|T235886]]
* 11:59 XioNoX: delete unused ROAs for RIPE prefixes - [[phab:T235886|T235886]]
* 11:42 mutante: miscweb2002 - race condition with apache2 mpm and php7.3 module met - a2dismond mpm_event ; systemctl restart apache2 ; puppet agent -tv (also see [[phab:T196968|T196968]], https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) [[phab:T247887|T247887]]
* 11:37 mutante: miscweb2002 - installed OS, added to puppet, added role and  ... sed -i 's/tin.eqiad/deployment.eqiad/g' /srv/deployment/iegreview/iegreview-cache/.config ([[phab:T247648|T247648]])
* 11:30 marostegui: Deploy schema change on dbstore1004:3314
* 11:22 XioNoX: delete ARIN allocations from RIPE's IRR - [[phab:T235886|T235886]]
* 11:11 Urbanecm: EU SWAT done
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]; take II) (duration: 00m 58s)
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]) (duration: 00m 58s)
* 11:08 vgutierrez: pool cp2030 - [[phab:T248816|T248816]]
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]; take II) (duration: 00m 59s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]) (duration: 00m 59s)
* 10:43 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:34 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 10:33 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:59 hoo: Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata JSON dumps start at 9:59 UTC today ([[phab:T248612|T248612]])
* 09:56 hoo@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/Wikibase/repo/maintenance/DumpEntities.php: DumpEntities: Fix DB group default override ([[phab:T248612|T248612]]) (duration: 01m 02s)
* 09:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 vgutierrez: pool cp2029 - [[phab:T248816|T248816]]
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 08:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 vgutierrez: depool & decommission cp2002 - [[phab:T248818|T248818]]
* 07:48 marostegui: Run cloudcontrol1003:~# wmcs-wikireplica-dns to promote dbproxy1018 to wikireplicas active proxy [[phab:T231520|T231520]]
* 07:40 marostegui: Replace dbproxy1010 with dbproxy1011 for wiki replicas, analytics - [[phab:T231520|T231520]]
* 07:28 marostegui: Deploy schema change on labswiki (wikitech) - [[phab:T248333|T248333]]
* 07:26 marostegui: Deploy schema change on s4 codfw, this will generate lag on codfw - [[phab:T248333|T248333]]
* 07:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 07:10 vgutierrez: depool and decommission cp2001 - [[phab:T248815|T248815]]
* 06:52 vgutierrez: pool cp2028 - [[phab:T247340|T247340]]
* 06:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P10813 and previous config saved to /var/cache/conftool/dbconfig/20200330-062858-marostegui.json
* 06:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:04 marostegui: Deploy schema change on db1074 with replication, this will generate lag on s2 labs
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P10812 and previous config saved to /var/cache/conftool/dbconfig/20200330-060338-marostegui.json
* 05:40 vgutierrez: pool cp2027 - [[phab:T247340|T247340]]
* 05:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 04:55 vgutierrez: Enable TLS Session tickets in ulsfo - [[phab:T245616|T245616]]
* 04:32 vgutierrez: upgrade ATS to version 8.0.6-1wm4 on ulsfo - [[phab:T245616|T245616]]


== 2020-03-29 ==
== 2023-05-19 ==
* 08:24 elukey: powercycle elastic1059 - mgmt/serial console stuck, no ssh - racadm getsel shows a lot of OEM errors occurred, nothing specific
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 21:19 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1495.eqiad.wmnet
* 19:46 mutante: mw1469 - sudo pkill ffmpeg (per runbook)
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1469.eqiad.wmnet
* 19:45 mutante: depooled mw1469 from videoscaler, dedicating to just jobrunner
* 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1469.eqiad.wmnet
* 19:36 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) (duration: 00m 09s)
* 19:36 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b34c529]: (no justification provided)
* 16:55 mutante: mw2448 - scap pull - [[phab:T2334429|T2334429]]
* 15:31 taavi@deploy1002: Finished scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] (duration: 22m 02s)
* 15:21 taavi@deploy1002: legoktm and taavi: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:09 taavi@deploy1002: Started scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]]
* 15:06 legoktm@deploy1002: Finished scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] (duration: 09m 46s)
* 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:59 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
* 14:58 legoktm@deploy1002: legoktm: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 14:57 legoktm@deploy1002: Started scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]]
* 14:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 14:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 14:35 sukhe: enable puppet on A:lvs, finished rolling out change
* 14:20 sukhe: disable puppet on A:lvs to roll out CR 910566
* 14:17 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
* 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
* 13:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 10s)
* 13:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
* 13:34 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
* 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
* 13:26 topranks: Adding vlan config for row e/f vlans on ssw1-f1-eqiad ([[phab:T322937|T322937]])
* 13:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 12:19 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
* 11:27 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
* 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye
* 10:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
* 10:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 10:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
* 10:45 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
* 10:44 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 10:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
* 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye
* 10:07 moritzm: installing ncurses security updates
* 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
* 09:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
* 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2040-2043].codfw.wmnet
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
* 09:21 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
* 09:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
* 09:15 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 09:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
* 09:02 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
* 08:59 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2040-2043].codfw.wmnet
* 08:58 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
* 08:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
* 08:45 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
* 08:41 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
* 08:38 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
* 08:38 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 08:34 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
* 08:31 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
* 08:27 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
* 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
* 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host netflow2003.codfw.wmnet with OS bookworm
* 08:11 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
* 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
* 08:09 moritzm: copy samplicator from bullseye-wikimedia to bookworm-wikimedia [[phab:T330884|T330884]]
* 08:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
* 07:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
* 07:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48397 and previous config saved to /var/cache/conftool/dbconfig/20230519-074256-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48396 and previous config saved to /var/cache/conftool/dbconfig/20230519-074044-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48395 and previous config saved to /var/cache/conftool/dbconfig/20230519-073959-root.json
* 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
* 07:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48394 and previous config saved to /var/cache/conftool/dbconfig/20230519-072751-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48393 and previous config saved to /var/cache/conftool/dbconfig/20230519-072539-root.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48392 and previous config saved to /var/cache/conftool/dbconfig/20230519-072454-root.json
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus4001.ulsfo.wmnet
* 07:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus4001.ulsfo.wmnet
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48391 and previous config saved to /var/cache/conftool/dbconfig/20230519-071247-root.json
* 07:11 moritzm: installing emacs security updates
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48390 and previous config saved to /var/cache/conftool/dbconfig/20230519-071034-root.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48389 and previous config saved to /var/cache/conftool/dbconfig/20230519-070949-root.json
* 06:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48388 and previous config saved to /var/cache/conftool/dbconfig/20230519-065742-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48387 and previous config saved to /var/cache/conftool/dbconfig/20230519-065530-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48386 and previous config saved to /var/cache/conftool/dbconfig/20230519-065445-root.json
* 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48385 and previous config saved to /var/cache/conftool/dbconfig/20230519-064237-root.json
* 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48384 and previous config saved to /var/cache/conftool/dbconfig/20230519-064025-root.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48383 and previous config saved to /var/cache/conftool/dbconfig/20230519-063940-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48382 and previous config saved to /var/cache/conftool/dbconfig/20230519-062733-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48381 and previous config saved to /var/cache/conftool/dbconfig/20230519-062520-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48380 and previous config saved to /var/cache/conftool/dbconfig/20230519-062435-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48379 and previous config saved to /var/cache/conftool/dbconfig/20230519-061228-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48378 and previous config saved to /var/cache/conftool/dbconfig/20230519-061016-root.json
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48377 and previous config saved to /var/cache/conftool/dbconfig/20230519-060931-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48376 and previous config saved to /var/cache/conftool/dbconfig/20230519-055723-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48375 and previous config saved to /var/cache/conftool/dbconfig/20230519-055511-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48374 and previous config saved to /var/cache/conftool/dbconfig/20230519-055426-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2027', diff saved to https://phabricator.wikimedia.org/P48373 and previous config saved to /var/cache/conftool/dbconfig/20230519-054952-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 to es3 master', diff saved to https://phabricator.wikimedia.org/P48372 and previous config saved to /var/cache/conftool/dbconfig/20230519-054923-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P48371 and previous config saved to /var/cache/conftool/dbconfig/20230519-054758-root.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2033 to es2 master', diff saved to https://phabricator.wikimedia.org/P48370 and previous config saved to /var/cache/conftool/dbconfig/20230519-054737-marostegui.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P48369 and previous config saved to /var/cache/conftool/dbconfig/20230519-054503-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master', diff saved to https://phabricator.wikimedia.org/P48368 and previous config saved to /var/cache/conftool/dbconfig/20230519-054403-marostegui.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1121 from dbctl [[phab:T336725|T336725]]', diff saved to https://phabricator.wikimedia.org/P48367 and previous config saved to /var/cache/conftool/dbconfig/20230519-053719-marostegui.json


== 2020-03-28 ==
== 2023-05-18 ==
* 16:54 elukey: restart yarn on analytics1071
* 23:26 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 12:05 vgutierrez: preemptive restart of ats-tls on cp1081 and cp3062 - [[phab:T248736|T248736]]
* 22:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 11:32 vgutierrez: restart ats-tls on cp1077 - [[phab:T248736|T248736]]
* 22:21 mutante: contint2001 - moving files owned by zuul to new UID/GID - in progress
* 08:34 vgutierrez: pool cp1089
* 22:20 mutante: short down-time for zuul-merger on contint2001
* 08:30 vgutierrez: restarting ats-tls on cp1089
* 21:47 mutante: maintenance for zuul (CI) on contint servers
* 21:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 21:13 brennen@deploy1002: Finished scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] (duration: 09m 38s)
* 21:05 brennen@deploy1002: brennen: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 21:03 brennen@deploy1002: Started scap: Backport for [[gerrit:920744{{!}}cache: Do not throw on empty set in LinkBatch::constructSet (T336964)]]
* 21:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] (duration: 08m 09s)
* 20:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:53 urbanecm@deploy1002: Started scap: Backport for [[gerrit:920743{{!}}Silently ignore istype-depicts image suggestion type (T336962)]]
* 20:36 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 20:33 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 20:16 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] (duration: 10m 25s)
* 20:07 urbanecm@deploy1002: ksarabia and urbanecm: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921059{{!}}Reverts hewiki A/B test (T335309)]]
* 18:57 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]] (duration: 00m 35s)
* 18:56 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@502ddae]: [[phab:T333001|T333001]]
* 18:55 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 18:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8  refs [[phab:T330215|T330215]]
* 18:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 18:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 18:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 18:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
* 18:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 18:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 18:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T332355|T332355]]
* 18:07 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]]
* 18:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 17:59 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - [[phab:T274204|T274204]]
* 17:38 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:37 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:36 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:35 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 17:27 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:55 XioNoX: push new pfw policies - [[phab:T336896|T336896]]
* 16:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
* 15:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 15:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 15:57 inflatador: bking@cumin1001 starting rolling restart of wcqs for java updates [[phab:T334470|T334470]]
* 15:53 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
* 15:50 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
* 15:47 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) (duration: 00m 10s)
* 15:47 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided)
* 15:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
* 15:37 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
* 15:31 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
* 15:25 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 15:23 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
* 15:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
* 15:19 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 15:18 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:18 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:17 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
* 15:15 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
* 15:09 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
* 15:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
* 15:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
* 15:03 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) (duration: 00m 06s)
* 15:02 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided)
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 14:31 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 14:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:01 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-serve-worker-codfw
* 13:59 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
* 13:52 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 13:50 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 13:47 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 13:18 TheresNoTime: closing backport window
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] (duration: 08m 45s)
* 13:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:07 samtar@deploy1002: samtar and s-mukuti: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:919023{{!}}InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)]]
* 13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:59 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 06m 19s)
* 12:57 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:54 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
* 12:51 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
* 12:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:46 otto@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - [[phab:T332012|T332012]] (duration: 07m 00s)
* 12:46 elukey: clean up old jupyterhub.service references (crash looping) on stat* nodes that had it
* 12:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
* 12:35 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
* 12:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
* 12:35 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 12:34 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 12:28 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
* 12:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
* 12:19 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
* 12:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
* 12:12 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS