You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(mutante: using planet1001 to manually hack APT sources to test new apt1001.wikimedia.org)
imported>Stashbot
(zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in group1 wikis (T299954) (duration: 08m 00s))
 
Line 1: Line 1:
== 2020-02-28 ==
== 2023-05-30 ==
* 21:31 mutante: using planet1001 to manually hack APT sources to test new apt1001.wikimedia.org
* 23:38 zabe@deploy1002: Finished scap: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]] (duration: 08m 00s)
* 20:29 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:31 zabe@deploy1002: zabe: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:26 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:30 zabe@deploy1002: Started scap: Backport for [[gerrit:924564{{!}}Start reading from rev_comment_id in group1 wikis (T299954)]]
* 19:01 milimetric@deploy1001: Finished deploy [analytics/refinery@0fc392f] (thin): Hotfix: going back to a safe version of geo udf (duration: 00m 07s)
* 22:22 ejegg: civicrm upgraded from {{Gerrit|415aa7e5}} to {{Gerrit|5905a403}}
* 19:01 milimetric@deploy1001: Started deploy [analytics/refinery@0fc392f] (thin): Hotfix: going back to a safe version of geo udf
* 21:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] (duration: 07m 48s)
* 19:01 milimetric@deploy1001: Finished deploy [analytics/refinery@0fc392f]: Hotfix: going back to a safe version of geo udf (duration: 13m 06s)
* 21:50 samtar@deploy1002: jforrester and samtar: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 18:47 milimetric@deploy1001: Started deploy [analytics/refinery@0fc392f]: Hotfix: going back to a safe version of geo udf
* 21:48 samtar@deploy1002: Started scap: Backport for [[gerrit:924570{{!}}linker: Check for null parser in Linker::makeThumbLink2 (T337794)]]
* 16:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:58 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:924569{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 20:57 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:924569{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]]
* 16:05 oblivian@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: cluster=kibana,service=kibana-next
* 20:40 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:924568{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]] (duration: 09m 27s)
* 15:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:32 ladsgroup@deploy1002: ladsgroup: Backport for [[gerrit:924568{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 15:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 20:30 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:924568{{!}}Add WANCache to ParserOutputPageProperties::finalize (T336698)]]
* 15:39 moritzm: installing libperl4-corelibs-perl updates from Stretch point release
* 20:12 inflatador: bking@wdqs2009 depool wdqs2009 until it catches up with lag
* 15:36 elukey@deploy1001: Finished deploy [analytics/refinery@28fa2fc]: fix for refinery-drop-older-than - part 2 (duration: 13m 40s)
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:924536{{!}}Turn on A/B Test Hebrew (T336969)]] (duration: 08m 46s)
* 15:24 marostegui: Stop replication on db1077 from db1111 (its master) - [[phab:T246447|T246447]]
* 20:03 samtar@deploy1002: ksarabia and samtar: Backport for [[gerrit:924536{{!}}Turn on A/B Test Hebrew (T336969)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:22 elukey@deploy1001: Started deploy [analytics/refinery@28fa2fc]: fix for refinery-drop-older-than - part 2
* 20:01 samtar@deploy1002: Started scap: Backport for [[gerrit:924536{{!}}Turn on A/B Test Hebrew (T336969)]]
* 14:17 gehel: rolling restart of elasticsearch/eqiad for JVM upgrade completed
* 19:48 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. [[phab:T335305|T335305]]. (duration: 00m 09s)
* 14:16 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 19:48 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. [[phab:T335305|T335305]].
* 14:15 elukey@deploy1001: Finished deploy [analytics/refinery@2db36f4]: Fix refinery-drop-older-than script (duration: 14m 01s)
* 19:36 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 04m 02s)
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 100 to 300', diff saved to https://phabricator.wikimedia.org/P10558 and previous config saved to /var/cache/conftool/dbconfig/20200228-141035-marostegui.json
* 19:32 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
* 14:01 elukey@deploy1001: Started deploy [analytics/refinery@2db36f4]: Fix refinery-drop-older-than script
* 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 54s)
* 13:58 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 19:29 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
* 13:32 marostegui: Reset idrac from db1114
* 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 16m 36s)
* 12:11 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 19:24 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
* 12:06 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 19:12 inflatador: [WDQS Deploy] Deploying version 0.3.124
* 11:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 19:11 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
* 11:04 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-restart (exit_code=97)
* 18:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]]
* 10:53 jynus: labsdb1009-12 prometheus metrics restored after 90 minutes of unscheduled unavailability
* 17:45 mutante: re-enabling puppet on contint2001
* 10:27 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 16:20 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyEdited
* 10:15 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
* 16:19 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyRegistered
* 10:13 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 16:14 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:924053{{!}}[Growth] Enable user impact refresh on 10 more wikis (T336203)]] (duration: 07m 08s)
* 10:01 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
* 16:07 urbanecm@deploy1002: Started scap: Backport for [[gerrit:924053{{!}}[Growth] Enable user impact refresh on 10 more wikis (T336203)]]
* 09:59 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 16:00 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:59 gehel: starting rolling restart of elasticsearch/eqiad for JVM upgrade
* 16:00 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1101:3318 from vslow,dump', diff saved to https://phabricator.wikimedia.org/P10555 and previous config saved to /var/cache/conftool/dbconfig/20200228-093653-marostegui.json
* 15:58 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into vslow,dump as it was there originally', diff saved to https://phabricator.wikimedia.org/P10554 and previous config saved to /var/cache/conftool/dbconfig/20200228-092631-marostegui.json
* 15:58 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after moving labs hosts back under it', diff saved to https://phabricator.wikimedia.org/P10553 and previous config saved to /var/cache/conftool/dbconfig/20200228-092453-marostegui.json
* 15:57 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:21 jynus: removed leftover labs prometheus target files from ops at prometheus1003, prometheus1004
* 15:56 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 08:44 moritzm: installing openssh updates from buster point release
* 15:56 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:44 addshore: END warming wikidata term cache on db1126 for Q6-8 million [[phab:T219123|T219123]] (pass2 today)
* 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:30 moritzm: installing mariadb-10.3 update from buster point release (just client-side libs and tools, no mysqlds)
* 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:24 moritzm: installing cups updates from buster point release
* 15:54 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:22 marostegui: Stop db1087 and db2079 in sync
* 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 to move labs hosts back under it', diff saved to https://phabricator.wikimedia.org/P10551 and previous config saved to /var/cache/conftool/dbconfig/20200228-082213-marostegui.json
* 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:12 addshore: START warming wikidata term cache on db1126 for Q6-8 million [[phab:T219123|T219123]] (pass2 today) (pass1 just finished)
* 15:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2002.codfw.wmnet with OS bullseye
* 08:05 moritzm: installing systemd bugfix update from Buster point release
* 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 07:38 addshore: START warming wikidata term cache on db1126 for Q6-8 million [[phab:T219123|T219123]] (pass1 today)
* 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 07:31 moritzm: installing gnutls28 bugfix update from Buster point release
* 15:49 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084 - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10550 and previous config saved to /var/cache/conftool/dbconfig/20200228-064037-marostegui.json
* 15:49 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): '75% of original weight to db1084 - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10549 and previous config saved to /var/cache/conftool/dbconfig/20200228-062536-marostegui.json
* 15:15 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
* 06:04 mutante: rsyncing APT repo and firmware data from install1002 to apt2001
* 15:15 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 05:58 mutante: apt2001 - signed puppet cert, initial run after OS install, rsyncing repo data, not in use yet
* 15:14 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
* 01:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 00m 56s)
* 15:10 tgr_: UTC evening deploys done
* 01:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T196466|T196466]] [wikitech] Remove the 'shell' user right from assignment and rights lists (duration: 00m 58s)
* 15:08 tgr@deploy1002: Finished scap: Backport for [[gerrit:924160{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]] (duration: 08m 08s)
* 01:15 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:05 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 01:05 James_F: Running mwscript emptyUserGroup.php --wiki=labswiki shell for [[phab:T196466|T196466]]
* 15:03 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 15:02 tgr@deploy1002: tgr and matmarex: Backport for [[gerrit:924160{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 15:00 tgr@deploy1002: Started scap: Backport for [[gerrit:924160{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]], [[gerrit:924456{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]], [[gerrit:924458{{!}}ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)]]
* 14:50 moritzm: installing texlive-bin security updates
* 14:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
* 14:46 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
* 14:36 tgr@deploy1002: Finished scap: Backport for [[gerrit:924159{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]] (duration: 08m 01s)
* 14:29 tgr@deploy1002: matmarex and tgr: Backport for [[gerrit:924159{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:28 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
* 14:27 tgr@deploy1002: Started scap: Backport for [[gerrit:924159{{!}}Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)]]
* 14:16 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mwlog2002.codfw.wmnet with OS bullseye
* 14:16 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetdb1003.eqiad.wmnet with OS bookworm
* 14:14 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:13 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:08 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 14:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 14:06 moritzm: installing libwebp security updates
* 14:06 tgr@deploy1002: Finished scap: Backport for [[gerrit:924158{{!}}editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]] (duration: 08m 14s)
* 13:59 tgr@deploy1002: tgr and matmarex: Backport for [[gerrit:924158{{!}}editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:58 tgr@deploy1002: Started scap: Backport for [[gerrit:924158{{!}}editpage: Change the order of hooks slightly for FlaggedRevs (T337637)]]
* 13:57 tgr@deploy1002: Finished scap: Backport for [[gerrit:924488{{!}}prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]] (duration: 16m 13s)
* 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2009.codfw.wmnet
* 13:55 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
* 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS bullseye
* 13:42 tgr@deploy1002: tgr and daimona: Backport for [[gerrit:924488{{!}}prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:40 tgr@deploy1002: Started scap: Backport for [[gerrit:924488{{!}}prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)]]
* 13:33 mlitn@deploy1002: Finished scap: Backport for [[gerrit:924454{{!}}Fix maxJobs default]], [[gerrit:924455{{!}}Fix maxJobs default]] (duration: 07m 39s)
* 13:27 mlitn@deploy1002: mlitn: Backport for [[gerrit:924454{{!}}Fix maxJobs default]], [[gerrit:924455{{!}}Fix maxJobs default]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:25 mlitn@deploy1002: Started scap: Backport for [[gerrit:924454{{!}}Fix maxJobs default]], [[gerrit:924455{{!}}Fix maxJobs default]]
* 13:20 tgr@deploy1002: Finished scap: Backport for [[gerrit:924079{{!}}GrowthExperiments: Re-add $wgGERestbaseUrl]] (duration: 09m 26s)
* 13:13 tgr@deploy1002: tgr: Backport for [[gerrit:924079{{!}}GrowthExperiments: Re-add $wgGERestbaseUrl]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:11 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
* 13:11 tgr@deploy1002: Started scap: Backport for [[gerrit:924079{{!}}GrowthExperiments: Re-add $wgGERestbaseUrl]]
* 13:09 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
* 13:09 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 13:09 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 13:08 bblack: lvs1018: restart pybal for wikireplicas monitoring removal
* 13:08 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 13:06 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 13:06 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
* 13:06 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 13:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 13:03 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
* 13:00 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
* 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:48 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS bullseye
* 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:29 volans: disablig puppet where cadvisor is present
* 12:14 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
* 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
* 11:50 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
* 11:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 11:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 11:46 slyngshede@cumin1001: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 11:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
* 11:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
* 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:14 hashar@deploy1002: Finished deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - [[phab:T331651|T331651]] (duration: 00m 08s)
* 11:14 hashar@deploy1002: Started deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - [[phab:T331651|T331651]]
* 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2006.codfw.wmnet with OS bookworm
* 11:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 10:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
* 10:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 10:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 10:50 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
* 10:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 10:41 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 10:11 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard2003.codfw.wmnet with OS bookworm
* 10:11 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard1003.eqiad.wmnet with OS bookworm
* 10:00 zabe@deploy1002: Finished scap: Backport for [[gerrit:924469{{!}}Start reading from rev_comment_id in group0 wikis (T299954)]] (duration: 08m 12s)
* 09:59 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
* 09:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
* 09:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
* 09:54 zabe@deploy1002: zabe: Backport for [[gerrit:924469{{!}}Start reading from rev_comment_id in group0 wikis (T299954)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 09:52 zabe@deploy1002: Started scap: Backport for [[gerrit:924469{{!}}Start reading from rev_comment_id in group0 wikis (T299954)]]
* 09:52 zabe@deploy1002: Finished scap: Backport for [[gerrit:923635{{!}}Check for null when using ::getCheckUserHelperFieldset (T337599)]] (duration: 09m 52s)
* 09:49 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
* 09:46 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
* 09:43 zabe@deploy1002: zabe: Backport for [[gerrit:923635{{!}}Check for null when using ::getCheckUserHelperFieldset (T337599)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:43 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bookworm
* 09:42 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:42 zabe@deploy1002: Started scap: Backport for [[gerrit:923635{{!}}Check for null when using ::getCheckUserHelperFieldset (T337599)]]
* 09:40 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 09:37 zabe@deploy1002: Finished scap: Backport for [[gerrit:922492{{!}}Start reading from rev_comment_id in test wikis (T299954)]] (duration: 07m 48s)
* 09:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
* 09:33 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:33 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:32 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
* 09:30 zabe@deploy1002: zabe: Backport for [[gerrit:922492{{!}}Start reading from rev_comment_id in test wikis (T299954)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:30 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:30 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb2003.codfw.wmnet with OS bookworm
* 09:29 zabe@deploy1002: Started scap: Backport for [[gerrit:922492{{!}}Start reading from rev_comment_id in test wikis (T299954)]]
* 09:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:24 tgr@deploy1002: Finished scap: Backport for [[gerrit:924361{{!}}Improve handling of missing image recommendation]] (duration: 08m 57s)
* 09:22 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
* 09:20 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
* 09:19 arturo: run aborrero@cumin1001:~ 2s 98 $ sudo cumin "P<nowiki>{</nowiki>R:Profile::Mariadb::Section = 's7'<nowiki>}</nowiki> and P<nowiki>{</nowiki>P:wmcs::db::wikireplicas::mariadb_multiinstance<nowiki>}</nowiki>" "/usr/local/sbin/maintain-meta_p --all-databases --bootstrap"
* 09:17 tgr@deploy1002: tgr: Backport for [[gerrit:924361{{!}}Improve handling of missing image recommendation]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 09:15 tgr@deploy1002: Started scap: Backport for [[gerrit:924361{{!}}Improve handling of missing image recommendation]]
* 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:14 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:13 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 09:11 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 09:11 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 09:06 tgr@deploy1002: Finished scap: Backport for [[gerrit:923644{{!}}Section images: Do not treat unexpected kinds as production errors]] (duration: 14m 22s)
* 09:00 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 09:00 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:59 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:54 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:53 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:53 tgr@deploy1002: tgr: Backport for [[gerrit:923644{{!}}Section images: Do not treat unexpected kinds as production errors]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 08:52 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:51 tgr@deploy1002: Started scap: Backport for [[gerrit:923644{{!}}Section images: Do not treat unexpected kinds as production errors]]
* 08:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:50 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:49 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:49 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:48 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:43 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:41 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:39 tgr@deploy1002: Finished scap: Backport for [[gerrit:923643{{!}}Improve logging of invalid image recommendation kinds]] (duration: 10m 30s)
* 08:39 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:39 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:36 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:36 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:35 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:34 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:33 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:31 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:31 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:30 tgr@deploy1002: tgr: Backport for [[gerrit:923643{{!}}Improve logging of invalid image recommendation kinds]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:29 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 08:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:29 tgr@deploy1002: Started scap: Backport for [[gerrit:923643{{!}}Improve logging of invalid image recommendation kinds]]
* 08:29 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:27 jayme: re-enable puppet on P:kubernetes::node for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
* 08:25 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:20 jayme: disable puppet on P:kubernetes::node (apart from staging-codfw) for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
* 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 08:15 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:14 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 08:12 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 08:12 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
* 08:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
* 08:08 tgr@deploy1002: Finished scap: Backport for [[gerrit:924356{{!}}Section images: Accept more recommendation types]] (duration: 07m 51s)
* 08:01 tgr@deploy1002: tgr: Backport for [[gerrit:924356{{!}}Section images: Accept more recommendation types]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:00 tgr@deploy1002: Started scap: Backport for [[gerrit:924356{{!}}Section images: Accept more recommendation types]]
* 07:56 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:924086{{!}}Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]] (duration: 09m 17s)
* 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb2003.codfw.wmnet with OS bookworm
* 07:48 ladsgroup@deploy1002: func and ladsgroup: Backport for [[gerrit:924086{{!}}Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:46 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:924086{{!}}Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)]]
* 07:45 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:45 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48633 and previous config saved to /var/cache/conftool/dbconfig/20230530-074445-root.json
* 07:44 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:42 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:41 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:40 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:38 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:38 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 07:31 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:31 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:30 moritzm: move LDAP permissions for hghani from cn=nda to cn=wmf [[phab:T322145|T322145]]
* 07:30 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48632 and previous config saved to /var/cache/conftool/dbconfig/20230530-072941-root.json
* 07:29 kartik@deploy1002: Finished scap: Backport for [[gerrit:924050{{!}}testwiki: Enable Section Translation for 9 Wikipedia (T337290)]] (duration: 09m 38s)
* 07:28 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:21 kartik@deploy1002: kartik: Backport for [[gerrit:924050{{!}}testwiki: Enable Section Translation for 9 Wikipedia (T337290)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:19 kartik@deploy1002: Started scap: Backport for [[gerrit:924050{{!}}testwiki: Enable Section Translation for 9 Wikipedia (T337290)]]
* 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:17 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:16 kartik@deploy1002: Finished scap: Backport for [[gerrit:923527{{!}}Undeploy Special:Contribute from unsupported skins (T337366)]] (duration: 11m 49s)
* 07:16 moritzm: update bookworm installer to rc4 [[phab:T330495|T330495]]
* 07:16 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48630 and previous config saved to /var/cache/conftool/dbconfig/20230530-071436-root.json
* 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:10 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 07:10 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
* 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:09 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:06 kartik@deploy1002: kartik: Backport for [[gerrit:923527{{!}}Undeploy Special:Contribute from unsupported skins (T337366)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 07:06 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:04 kartik@deploy1002: Started scap: Backport for [[gerrit:923527{{!}}Undeploy Special:Contribute from unsupported skins (T337366)]]
* 07:04 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 07:03 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 07:02 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
* 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 07:02 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 07:01 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48629 and previous config saved to /var/cache/conftool/dbconfig/20230530-065932-root.json
* 06:58 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 06:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
* 06:51 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
* 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:50 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
* 06:48 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
* 06:48 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48628 and previous config saved to /var/cache/conftool/dbconfig/20230530-064427-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48625 and previous config saved to /var/cache/conftool/dbconfig/20230530-062922-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48624 and previous config saved to /var/cache/conftool/dbconfig/20230530-061417-root.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48623 and previous config saved to /var/cache/conftool/dbconfig/20230530-055913-root.json
* 05:43 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 62597
* 05:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
* 05:41 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 1255 hosts
* 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 1255 hosts
* 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 784 hosts
* 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 784 hosts
* 05:28 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 784 hosts
* 05:27 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 784 hosts
* 05:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 1255 hosts
* 05:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 1255 hosts
* 05:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62597
* 05:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
* 04:28 kart_: Updated cxserver to 2023-05-29-112644-production ([[phab:T337657|T337657]])
* 04:28 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 04:27 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 04:24 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 04:24 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 04:21 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 04:20 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.9 (duration: 02m 10s)
* 03:52 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]] (duration: 49m 54s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.11  refs [[phab:T337525|T337525]]


== 2020-02-27 ==
== 2023-05-29 ==
* 23:53 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:19 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
* 23:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgLogos['wordmark'] based on wgMinervaCustomLogos, never set (duration: 00m 56s)
* 15:19 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
* 23:07 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 00m 56s)
* 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 23:04 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Merge wgMinervaCustomLogos into wgLogos, take 2 (duration: 00m 56s)
* 14:18 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 23:01 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Only try to set wgLogos['wordmark'] if not already done (duration: 00m 58s)
* 13:57 vgutierrez@puppetmaster1001: conftool action : set/weight=10; selector: name=dbproxy.*,dc=eqiad
* 22:49 James_F: Manually `scap pull`ed on mw1349 and mw1351 as they were emitting odd errors.
* 11:25 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 22:06 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@5a67e6e]: AQS: Minor fix take 3 (duration: 07m 24s)
* 11:24 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 21:59 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@5a67e6e]: AQS: Minor fix take 3
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48618 and previous config saved to /var/cache/conftool/dbconfig/20230529-112242-root.json
* 21:53 effie: depool mw1262, suspecting it might have overloaded logstash
* 11:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 21:51 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@c70b338]: AQS: Minor fix take 2 (duration: 02m 59s)
* 11:13 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 21:50 shdubsh: start elasticsearch on logastash1010
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48617 and previous config saved to /var/cache/conftool/dbconfig/20230529-110737-root.json
* 21:48 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@c70b338]: AQS: Minor fix take 2
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48616 and previous config saved to /var/cache/conftool/dbconfig/20230529-105233-root.json
* 21:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Roll back to setting wgMinervaCustomLogos (duration: 00m 33s)
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48615 and previous config saved to /var/cache/conftool/dbconfig/20230529-103728-root.json
* 21:42 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Use the four dblists again (duration: 00m 33s)
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48614 and previous config saved to /var/cache/conftool/dbconfig/20230529-102223-root.json
* 21:40 jforrester@deploy1001: Synchronized dblists/: Re-establish dblists everywhere (duration: 00m 33s)
* 10:07 vgutierrez: restarting pybal on lvs1018
* 21:39 jforrester@deploy1001: scap failed: average error rate on 11/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48612 and previous config saved to /var/cache/conftool/dbconfig/20230529-100719-root.json
* 21:25 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Touch the dblists list (duration: 00m 56s)
* 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 21:22 jforrester@deploy1001: Scap failed!: 8/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 10:05 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 21:19 jforrester@deploy1001: Scap failed!: 10/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 21:19 jforrester@deploy1001: Scap failed!: 10/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 10:05 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 21:16 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@c70b338]: AQS: Minor fix (duration: 02m 30s)
* 10:04 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 21:14 jforrester@deploy1001: Synchronized multiversion/MWWikiversions.php: Drop references to four dblists to canaries too (duration: 00m 55s)
* 10:04 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 21:13 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@c70b338]: AQS: Minor fix
* 10:03 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 21:13 jforrester@deploy1001: Synchronized dblists/: Add back the deleted dblists to make the canaries quiet (duration: 00m 56s)
* 10:03 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 21:11 jforrester@deploy1001: sync-file aborted: Drop references to four dblists (duration: 00m 05s)
* 10:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 21:11 jforrester@deploy1001: Synchronized multiversion/MWWikiversions.php: Drop references to four dblists (duration: 00m 35s)
* 10:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 21:10 jforrester@deploy1001: Scap failed!: 10/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 10:00 vgutierrez: restarting pybal on lvs1020
* 21:07 jforrester@deploy1001: Scap failed!: 10/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 09:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 21:04 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 00m 56s)
* 09:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 21:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Merge wgMinervaCustomLogos into wgLogos (duration: 00m 57s)
* 09:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 20:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 20:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48611 and previous config saved to /var/cache/conftool/dbconfig/20230529-095214-root.json
* 20:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:52 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 20:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 09:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 20:22 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 20:22 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 20:21 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 20:21 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 20:16 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48610 and previous config saved to /var/cache/conftool/dbconfig/20230529-093709-root.json
* 20:16 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 20:14 effie: pool mw1262
* 09:31 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 20:07 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 20:07 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:29 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 20:05 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.21  refs [[phab:T233869|T233869]]
* 09:13 godog: start partial rollout of cadvisor to eqiad/codfw (~10%) [[phab:T108027|T108027]]
* 20:00 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48609 and previous config saved to /var/cache/conftool/dbconfig/20230529-090216-root.json
* 20:00 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48608 and previous config saved to /var/cache/conftool/dbconfig/20230529-084711-root.json
* 19:46 mutante: Welcome new deployers Thalia Chan, Moriel Schottlender and Dayllan Maza (Anti-Harrassment-Tools team)
* 08:45 godog: delete old raw blocks from thanos - [[phab:T337236|T337236]]
* 19:38 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48607 and previous config saved to /var/cache/conftool/dbconfig/20230529-083206-root.json
* 19:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48606 and previous config saved to /var/cache/conftool/dbconfig/20230529-081702-root.json
* 19:21 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: once more for good measure (duration: 01m 03s)
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48604 and previous config saved to /var/cache/conftool/dbconfig/20230529-080157-root.json
* 19:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574634{{!}}Enable articletopic: search keyword in CirrusSearch (T240559)]] (duration: 01m 05s)
* 07:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 19:17 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 07:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 19:17 effie: depool mw1262
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48603 and previous config saved to /var/cache/conftool/dbconfig/20230529-074653-root.json
* 19:17 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48602 and previous config saved to /var/cache/conftool/dbconfig/20230529-073148-root.json
* 19:17 mutante: ganeti2001 - removing VM apt2001 to re-create it after IP change
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48601 and previous config saved to /var/cache/conftool/dbconfig/20230529-071643-root.json
* 19:13 milimetric@deploy1001: Finished deploy [analytics/refinery@357ff5c] (thin): Refinery using 0.0.115 (duration: 00m 07s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s2, s3, s5 [[phab:T337446|T337446]]', diff saved to https://phabricator.wikimedia.org/P48598 and previous config saved to /var/cache/conftool/dbconfig/20230529-051043-root.json
* 19:12 milimetric@deploy1001: Started deploy [analytics/refinery@357ff5c] (thin): Refinery using 0.0.115
* 19:06 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:51 elukey: upgrade prometheus-mcrouter-exporter to 0.1.0+git20200227-1 on hosts
* 18:48 milimetric@deploy1001: Finished deploy [analytics/refinery@357ff5c]: Refinery using 0.0.115 (duration: 10m 11s)
* 18:43 mutante: adding parse2* machines to puppet
* 18:37 milimetric@deploy1001: Started deploy [analytics/refinery@357ff5c]: Refinery using 0.0.115
* 18:31 volans: restarting icinga on icinga1001, command file randomly discarding commands
* 18:21 addshore: END warming wikidata term cache on db1126 for Q6-8 million [[phab:T219123|T219123]] (pass1) (will do 2 more passes tomorrow)
* 18:20 elukey: upload prometheus-mcrouter-exporter 0.1.0+git20200227-1 to stretch-wikimedia
* 17:52 addshore: resume item migration script at Q50 million [[phab:T219123|T219123]] (batch size of 100, 1s sleep)
* 17:49 ebernhardson: delete commonswiki_file_1582685980 from cloudelastic-chi, reindex failed and commonswiki_file_first is still primary
* 17:41 effie: enable puppet on thumbor*
* 17:40 effie: stop and mask all nginx on thumbor*
* 17:34 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:34 volans@cumin1001: START - Cookbook sre.hosts.downtime
* 17:33 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:31 addshore: START warming wikidata term cache on db1126 for Q6-8 million [[phab:T219123|T219123]] (pass1)
* 17:31 vgutierrez: (from 17:03) reimage lvs5003 with buster - [[phab:T245984|T245984]]
* 17:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1087 at 20% [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10547 and previous config saved to /var/cache/conftool/dbconfig/20200227-173017-jynus.json
* 17:20 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Reading up to Q6M for the new term store for clients (was Q4M) + warm db1126 caches ([[phab:T219123|T219123]]) cache bust (duration: 01m 04s)
* 17:19 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Reading up to Q6M for the new term store for clients (was Q4M) + warm db1126 caches ([[phab:T219123|T219123]]) (duration: 01m 04s)
* 17:18 addshore: (relog FROM 5:11) END warming wikidata term cache on db1126 for Q4-6 million [[phab:T219123|T219123]] (pass2)
* 16:55 vgutierrez: re-enable BGP in lvs4005 - [[phab:T245984|T245984]]
* 16:50 volans: temporarily decommented external check for icinga2001. Restarting Icinga on icinga2001
* 16:49 addshore: START warming wikidata term cache on db1126 for Q4-6 million [[phab:T219123|T219123]] (pass2)
* 16:49 addshore: END warming wikidata term cache on db1126 for Q4-6 million [[phab:T219123|T219123]] (pass1)
* 16:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:24 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:22 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:21 moritzm: installing wget security updates on jessie
* 16:20 vgutierrez: reimage lvs4005 with buster - [[phab:T245984|T245984]]
* 16:12 papaul: rebooting parse2009 to clear memory error
* 16:11 Urbanecm: foreachwiki extensions/AbuseFilter/maintenance/fixOldLogEntries.php --verbose started ([[phab:T228655|T228655]])
* 16:10 Urbanecm: mwscript extensions/AbuseFilter/maintenance/fixOldLogEntries.php --wiki=mediawikiwiki --verbose ([[phab:T228655|T228655]])
* 16:10 vgutierrez: re-enable BGP in lvs4006 - [[phab:T245984|T245984]]
* 16:09 addshore: begin warming wikidata term cache on db1126 for Q4-6 million [[phab:T219123|T219123]]
* 16:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Reading up to Q4M for the new term store for clients (was Q2M) + warm db1126 caches ([[phab:T219123|T219123]]) cache bust (duration: 01m 04s)
* 16:05 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Reading up to Q4M for the new term store for clients (was Q2M) + warm db1126 caches ([[phab:T219123|T219123]]) (duration: 01m 04s)
* 16:05 moritzm: installing python3.7 security updates on Buster
* 16:02 effie: disable puppet on thumbor*
* 15:59 moritzm: installing e2fsck security updates on buster
* 15:56 moritzm: installing python-django updates (packaged Debian version)
* 15:52 moritzm: installing python-pysaml security updates
* 15:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:32 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/ConfirmEdit/includes/auth/CaptchaPreAuthenticationProvider.php: [[phab:T245280|T245280]] (duration: 01m 04s)
* 15:31 reedy@deploy1001: Synchronized php-1.35.0-wmf.21/extensions/ConfirmEdit/includes/auth/CaptchaPreAuthenticationProvider.php: [[phab:T245280|T245280]] (duration: 01m 05s)
* 15:29 moritzm: restarting mw canaries to pick up curl update
* 15:23 moritzm: installing curl security updates on stretch/buster
* 15:17 vgutierrez: reimage lvs4006 with buster - [[phab:T245984|T245984]]
* 15:03 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1084 at 50% [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10542 and previous config saved to /var/cache/conftool/dbconfig/20200227-150302-jynus.json
* 14:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 vgutierrez: reimage lvs4007 with buster - [[phab:T245984|T245984]]
* 14:09 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: {{Gerrit|7e3a57a}}: Increase arwiki WikiGap throttle lift to 400 accounts ([[phab:T246092|T246092]]) (duration: 01m 05s)
* 13:28 _joe_: installing envoy in eqiad too
* 13:13 cdanis: s/camping/clamping/
* 13:11 XioNoX: esams/knams rollback tcp-mss camping and prepending
* 13:07 _joe_: restarting envoy, after chowning the log files, on all codfw mw servers where it was installed
* 13:06 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q2M (was Q8M) again ([[phab:T219123|T219123]]) ?cachebust (duration: 01m 03s)
* 13:05 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q2M (was Q8M) again ([[phab:T219123|T219123]]) (duration: 01m 03s)
* 13:03 _joe_: re-stopped puppet on codfw
* 12:56 XioNoX: delete specific tcp-mss on cr2-eqiad:equinix (will cause an interface flap) - [[phab:T244610|T244610]]
* 12:41 XioNoX: bump BGP prefix-limit on all routers - [[phab:T246110|T246110]]
* 12:38 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8M (was Q6M) again ([[phab:T219123|T219123]]) ?cachebust (duration: 01m 03s)
* 12:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8M (was Q6M) again ([[phab:T219123|T219123]]) (duration: 01m 04s)
* 12:27 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6M (was Q2M) again ([[phab:T219123|T219123]]) cachebust? (duration: 01m 17s)
* 12:24 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6M (was Q2M) again ([[phab:T219123|T219123]]) (duration: 01m 45s)
* 12:20 vgutierrez@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:19 vgutierrez@cumin2001: START - Cookbook sre.hosts.decommission
* 12:18 vgutierrez@cumin2001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 12:18 vgutierrez@cumin2001: START - Cookbook sre.hosts.decommission
* 12:14 vgutierrez: replace lvs2003 with lvs2009 - [[phab:T196560|T196560]] [[phab:T245984|T245984]] [[phab:T246334|T246334]]
* 12:11 Urbanecm: EU SWAT done
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|daee105}}: Add ids.si.edu to the wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T246330|T246330]]; take II) (duration: 01m 04s)
* 12:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|daee105}}: Add ids.si.edu to the wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T246330|T246330]]) (duration: 01m 05s)
* 11:48 vgutierrez: run decommision script against lvs2006.codfw.wmnet - [[phab:T246329|T246329]]
* 11:47 vgutierrez@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:47 vgutierrez@cumin2001: START - Cookbook sre.hosts.decommission
* 11:45 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1084 at 10% [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10538 and previous config saved to /var/cache/conftool/dbconfig/20200227-114542-jynus.json
* 11:35 addshore: pause item migration script at Q50 million [[phab:T219123|T219123]]
* 11:02 vgutierrez: start pybal on lvs2003 - [[phab:T196560|T196560]] [[phab:T245984|T245984]]
* 10:58 vgutierrez: stop pybal on lvs2003 to let lvs2010 take the traffic for a little bit - [[phab:T196560|T196560]] [[phab:T245984|T245984]]
* 10:54 vgutierrez: replacing lvs2006 with lvs2010 - [[phab:T196560|T196560]] [[phab:T245984|T245984]]
* 09:35 jynus: upgrade and restart db1084 [[phab:T246323|T246323]]
* 09:03 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1098 (s6 & s7)', diff saved to https://phabricator.wikimedia.org/P10536 and previous config saved to /var/cache/conftool/dbconfig/20200227-090344-jynus.json
* 08:26 jynus: killed SpecialFewestRevisions::reallyDoQuery long running query on db1101:s8, causing lag
* 08:14 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1098 at 50%', diff saved to https://phabricator.wikimedia.org/P10535 and previous config saved to /var/cache/conftool/dbconfig/20200227-081449-jynus.json
* 03:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:28 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:26 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:24 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:22 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:45 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:27 XioNoX: re-enable BGP to telia in esams
* 01:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:56 cdanis: repool esams 🙌 😎
* 00:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:21 jforrester@deploy1001: Synchronized w/extract2.php: [[phab:T239975|T239975]]: Use Article::getPage()->getTouched(), not Article::getTouched (duration: 01m 04s)
* 00:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 04s)
* 00:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T232140|T232140]]: Merge definition of wgLogos and wgLogo (duration: 01m 04s)
* 00:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T232140|T232140]]: Stop setting wgLogoHD from wgLogos (duration: 01m 05s)
* 00:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 03s)
* 00:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T246212|T246212]] Stop setting wgULSLanguageDetection in IS, set in CS (duration: 01m 05s)


== 2020-02-26 ==
== 2023-05-28 ==
* 23:59 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T246212|T246212]] Set wgULSLanguageDetection false in CS (duration: 01m 04s)
* 13:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 23:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 04s)
* 13:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 23:54 James_F: jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[phab:T246193|T246193]] Stop setting wgAllowTitlesInSVG, never read (and this was default anyway) (duration: 01m 05s)
* 13:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
* 23:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:16 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
* 23:16 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 06:12 marostegui: Change innodb_fast_shutdown to 0 on db1154 before downgrading [[phab:T337446|T337446]]
* 23:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:58 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:58 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:44 foks: removing one file for legal compliance
* 22:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:25 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:52 Urbanecm: Password reset for User:Joax ([[phab:T242941|T242941]])
* 21:28 mutante: ganeti - shutting apt2001 down again
* 21:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:574454{{!}}Decrease the reads for term store for clients down to Q2Mio (T219123)]], take II (duration: 01m 04s)
* 21:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:574454{{!}}Decrease the reads for term store for clients down to Q2Mio (T219123)]] (duration: 01m 04s)
* 21:15 mutante: ganeti - re-starting apt2001 which is mysteriously broken and "half up" ..as in you can't ssh to it and don't get console but it does cause icinga alerts
* 20:35 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.21/extensions/Wikibase/lib/includes/Store/Sql/Terms: SWAT: [[gerrit:575055{{!}}Do prefetching entity ids on batches of 20 entity per query (T246159)]] (duration: 01m 04s)
* 20:20 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.21  refs [[phab:T233869|T233869]] (duration: 01m 04s)
* 20:19 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.21  refs [[phab:T233869|T233869]]
* 20:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 20:10 XioNoX: add BGP to AS4780 in Equinix Palo-Alot
* 20:09 XioNoX: add BGP to AS8859 in AMS-IX
* 20:00 Amir1: Morning SWAT is done
* 19:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q6Mio (T219123)]], take II (duration: 01m 04s)
* 19:56 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q6Mio (T219123)]] (duration: 01m 02s)
* 18:09 bstorm_: downtimed labstore1004/5, cloudstore1008/9 and cloudbackup1001/2 for merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/571821
* 18:05 mutante: phab1001 - manually running community_metrics and project_changes scripts (crons) ([[phab:T244677|T244677]])
* 17:49 Amir1: setting cache type of mwdebug1001 to LCStoreStaticArray, this would break group1 and group2 in that node ([[phab:T99740|T99740]])
* 17:42 XioNoX: remove ns2 redirect to eqiad on cr3-knams
* 17:40 XioNoX: re-enable transits on cr3-esams
* 17:09 robh: cr2-esasms work done, cr3-esams linecard swap starting now via [[phab:T245825|T245825]]
* 16:40 robh: please note cr2-esams work is ongoing via [[phab:T246009|T246009]] and its downtime is expected
* 16:00 jynus: deploy new grants to phabricator stats user to database on m3 [[phab:T246105|T246105]]
* 15:51 jynus: starting s2, s3 eqiad backup source data check; expect increase read traffic on db1095:3313, db1140:3312, db1078, db1090:3312 [[phab:T244958|T244958]]
* 15:25 addshore: addshore@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=1 --file=20to30holes-25feb2229 # [[phab:T219123|T219123]]
* 15:19 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:17 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 14:54 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 14:54 volans@cumin2001: START - Cookbook sre.hosts.decommission
* 14:51 volans@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:46 volans@cumin2001: START - Cookbook sre.ganeti.makevm
* 14:19 volans@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:19 volans@cumin2001: START - Cookbook sre.hosts.decommission
* 14:12 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:11 volans@cumin2001: START - Cookbook sre.hosts.decommission
* 14:05 gehel: restart of elasticsearch on cloudelastic for JVM upgrade completed
* 14:03 XioNoX: deactivate BGP to AS23930 on cr1-eqsin, will re-enable when their technical issues are fixed and they notify us
* 14:00 elukey: run apt-get clean on notebook1004 to free some space - [[phab:T224682|T224682]]
* 13:46 XioNoX: ganeti2001:~$ sudo gnt-instance shutdown apt2001.wikimedia.org - [[phab:T224576|T224576]]
* 12:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:24 kartik@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit{{!}}416973{{!}}ContentTranslation: Set cookieDomain for Production]] (duration: 01m 04s)
* 12:11 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}574469{{!}}Enable CX out of beta in eu, sw, and ta Wikipedias (T245446, T245447, T245448)]] take II (duration: 01m 05s)
* 12:10 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}574469{{!}}Enable CX out of beta in eu, sw, and ta Wikipedias (T245446, T245447, T245448)]] (duration: 01m 15s)
* 12:05 volans: uploaded spicerack_0.0.31-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 11:45 jbond42: changing uid/gid of reprepro effects release[12]001/install[12]002
* 11:05 moritzm: rolling out remaining PHP 7.0 security updates
* 10:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 10:52 moritzm: installing clamav security updates on mendelevium (ticket.wikimedia.org
* 10:03 elukey: upgrade prometheus-mcrouter-exporter 0.1.0+git20200225-1 to all cumin alias parsoid/deployment-servers/mw-maintenance
* 09:54 elukey: upgrade prometheus-mcrouter-exporter 0.1.0+git20200225-1 to all cumin alias all-mw-eqiad
* 09:37 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 09:34 elukey: roll restart the Hadoop Analytcs workers for openjdk upgrades
* 09:32 elukey: upgrade prometheus-mcrouter-exporter 0.1.0+git20200225-1 to all cumin alias all-mw-codfw
* 09:18 gehel: restarting elasticsearch on cloudelastic for JVM upgrade
* 08:51 elukey: upload prometheus-mcrouter-exporter 0.1.0+git20200225-1 to stretch-wikimedia
* 08:38 elukey: upgrade prometheus-mcrouter-exporter on mwdebug1001 to test the new version
* 06:19 marostegui: Stop MySQL and poweroff db1084 for BBU replacement - [[phab:T245647|T245647]]
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10530 and previous config saved to /var/cache/conftool/dbconfig/20200226-061710-marostegui.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Restore es1017 (master) original weight (0) [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10529 and previous config saved to /var/cache/conftool/dbconfig/20200226-061640-marostegui.json
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for BBU replacement - [[phab:T245647|T245647]]', diff saved to https://phabricator.wikimedia.org/P10528 and previous config saved to /var/cache/conftool/dbconfig/20200226-060906-marostegui.json
* 05:41 kart_: Updated cxserver to 2020-02-24-110149-production ([[phab:T227183|T227183]])
* 05:35 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:31 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:29 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 01:15 ejegg: updated payments-wiki from {{Gerrit|c3ca3ad6a7}} to {{Gerrit|bfae734204}}
* 00:48 eileen: civicrm revision changed from {{Gerrit|bec2d6ad9f}} to {{Gerrit|62e62e107c}}, config revision is {{Gerrit|c0ef31e2fd}}
* 00:21 James_F: Manually purged https://de.wikipedia.org/w/index.php?title=Hans-Werner_Sahm&action=history from mwmaint1002
* 00:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 03s)
* 00:15 James_F: SWAT complete.
* 00:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T242381|T242381]] Set Vector skin version defaults so they can be changed on Beta Cluster (duration: 01m 04s)
* 00:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 03s)
* 00:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245792|T245792]] Enable password-reset-update on Wikivoyages and Wiktionaries (duration: 01m 04s)
* 00:08 ebernhardson: resume writes from mediawiki to cloudelastic


== 2020-02-25 ==
== 2023-05-27 ==
* 23:51 XioNoX: cr2-esams> request chassis fpc slot 0 offline - [[phab:T246009|T246009]]
* 21:40 Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 ([[phab:T337446|T337446]])
* 23:38 ebernhardson: pause mediawiki writes to cloudelastic to let old gc on cloudelastic1001-chi recover
* 17:42 godog: silence systemd state alert flapping on stat1009 until monday
* 23:30 mutante: notebook1004 - disk full once again ([[phab:T232068|T232068]])
* 00:03 tzatziki: removing 1 file for legal compliance
* 23:28 mutante: adding mw2366 through mw2376 to site
* 22:17 jhuneidi@deploy1001: Synchronized php-1.35.0-wmf.21/includes/Defines.php: Update MW_VERSION to 1.35.0-wmf.21 (duration: 01m 04s)
* 22:17 mutante: scandium restarting php7.2-fpm
* 22:15 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:15 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 21:29 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.21  refs [[phab:T233869|T233869]]
* 21:19 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.21  refs [[phab:T233869|T233869]] (duration: 75m 21s)
* 20:42 eileen: process-control config revision is {{Gerrit|c0ef31e2fd}}
* 20:32 eileen: process-control config revision is {{Gerrit|e17d104c73}} slow down delete deleted contacts
* 20:28 tzatziki: reset password for ClioCJS
* 20:25 tzatziki: changing email address for ClioCJS
* 20:25 mutante: apt.wikimedia.org (current install* and new apt* roles) - going ECDSA-only and removing RSA certificate from nginx config - to support buster without having to maintain patched nginx for duplicate ssl_stapling_file directive - at the cost of slightly reduced back-compat on the public repo ([[phab:T242602|T242602]])
* 20:24 mutante: apt.wikimedia.org (current install* and new apt* roles) - going ECDSA-only and removing RSA certificate from nginx config - to support buster without having to maintain patched nginx for duplicate ssl_stapling_file directive - at the cost of slightly reduced back-compat on the public repo ([[phab:T224576|T224576]])
* 20:18 eileen: process-control config revision is {{Gerrit|e17d104c73}}
* 20:04 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.21  refs [[phab:T233869|T233869]]
* 20:01 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.19 (duration: 14m 35s)
* 19:58 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:55 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:54 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:52 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:47 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:47 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:45 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:44 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:39 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:30 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:23 longma: 1.35.0-wmf.21 was branched at {{Gerrit|ed65726f0dcaf2b163ba44426d5e780bc7f8895d}} for [[phab:T233869|T233869]]
* 19:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Decrease the reads for term store for clients back to Q2Mio (T219123)]], take II (duration: 00m 56s)
* 18:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Decrease the reads for term store for clients back to Q2Mio (T219123)]] (duration: 00m 56s)
* 18:00 jynus@cumin1001: dbctl commit (dc=all): 'increase s8 special replica weight', diff saved to https://phabricator.wikimedia.org/P10520 and previous config saved to /var/cache/conftool/dbconfig/20200225-180016-jynus.json
* 17:21 jynus@cumin1001: dbctl commit (dc=all): 'increase es1019 load to 50% [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10519 and previous config saved to /var/cache/conftool/dbconfig/20200225-172133-jynus.json
* 17:15 vgutierrez: restart ats-tls on cp1075 - [[phab:T244538|T244538]]
* 17:10 ejegg: restarted new Ingenico recurring donation charge job
* 17:02 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q6Mio (T219123)]], take II (duration: 00m 55s)
* 17:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:00 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q6Mio (T219123)]] (duration: 00m 56s)
* 16:45 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q4Mio (T219123)]], take II (duration: 00m 56s)
* 16:36 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q4Mio (T219123)]] (duration: 00m 56s)
* 16:25 vgutierrez: enable BGP in lvs2009 - [[phab:T196560|T196560]] [[phab:T245984|T245984]]
* 16:17 godog: restart debmonitor / puppetboard - [[phab:T245512|T245512]]
* 16:17 moritzm: installing pillow security updates
* 16:09 vgutierrez: update puppet compiler facts
* 16:08 XioNoX: add BGP to lvs2009 on cr1/2-codfw
* 16:02 jynus@cumin1001: dbctl commit (dc=all): 'repool es1019 with low load after maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10516 and previous config saved to /var/cache/conftool/dbconfig/20200225-160215-jynus.json
* 16:00 ejegg: restarted legacy Ingenico recurring donation charge job
* 15:59 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q2Mio (T219123)]], take II (duration: 00m 55s)
* 15:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:58 ejegg: updated Fundraising CiviCRM from {{Gerrit|88c72e39ca}} to {{Gerrit|bec2d6ad9f}}
* 15:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q2Mio (T219123)]] (duration: 00m 56s)
* 15:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q1Mio (T219123)]], take II (duration: 00m 55s)
* 15:34 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q1Mio (T219123)]] (duration: 00m 56s)
* 15:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q512K (T219123)]], take II (duration: 00m 55s)
* 15:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q512K (T219123)]] (duration: 00m 56s)
* 15:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]], take II (duration: 00m 55s)
* 15:02 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]] (duration: 00m 56s)
* 14:46 godog: roll restart netbox uwsgi - [[phab:T245511|T245511]]
* 14:40 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:39 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:39 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:39 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:37 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib: [[gerrit:574746{{!}}wbterms: only select entity terms that are requested (T246005)]] (duration: 01m 02s)
* 14:30 vgutierrez: restart pybal with BGP enabled on lvs2010 - [[phab:T245984|T245984]] [[phab:T196560|T196560]]
* 14:20 vgutierrez: update puppet compiler facts
* 14:16 bblack: dns1002 - start reimage - [[phab:T241770|T241770]]
* 14:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:574743{{!}}Reinstate wgULSLanguageDetection setting (T246071)]] (duration: 01m 03s)
* 14:14 XioNoX: add bgp session to 10.192.49.7 (lvs2010) on cr1/cr2-codfw
* 14:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:42 godog: roll-restart logstash in eqiad/codfw - [[phab:T227080|T227080]]
* 13:28 Urbanecm: mwscript updateSpecialPages.php --wiki=enwiki --override --only=Mostcategories
* 13:00 Urbanecm: Run mwscript updateSpecialPages.php --wiki=enwiki --override --only=Uncategorizedcategories, cron didn't do that for several months ([[phab:T246063|T246063]])
* 12:51 marostegui: Stop mysql on es1019 - [[phab:T243963|T243963]]
* 12:49 bblack: dns1002 - shutdown for hardware work after confirming drain of live requests - [[phab:T241770|T241770]]
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for on-site maintenance - [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10512 and previous config saved to /var/cache/conftool/dbconfig/20200225-124650-marostegui.json
* 12:44 bblack: dns1002 - downtimed, disabled puppet, and depool (stop BGP adverts) for hardware work - [[phab:T241770|T241770]]
* 12:33 Urbanecm: Run mwscript updateSpecialPages.php --wiki=enwiki --override --only=Wantedtemplates, cron didn't do that for several months ([[phab:T246063|T246063]])
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1107 for 10.4 on special groups 10 -> 50 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10511 and previous config saved to /var/cache/conftool/dbconfig/20200225-123222-marostegui.json
* 12:14 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|1f58d9a}}: New throttle rule for arwiki WikiGap ([[phab:T246092|T246092]]) (duration: 00m 56s)
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|cdde3a2}}: {{Gerrit|db90d22}} ([[phab:T245525|T245525]], [[phab:T243359|T243359]]) (duration: 00m 58s)
* 10:11 volans: re-enabling puppet on A:swift-be-eqiad
* 09:31 volans: re-enabling puppet on A:swift-be-codfw
* 09:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:10 addshore: addshore@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=1 --file=10to20holes-24feb1345 # [[phab:T219123|T219123]]
* 09:09 addshore: addshore@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=1 --file=10to20holes-24feb1345
* 08:23 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing in main API and special groups - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10510 and previous config saved to /var/cache/conftool/dbconfig/20200225-075304-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to analyze recentchanges table - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10508 and previous config saved to /var/cache/conftool/dbconfig/20200225-065741-marostegui.json
* 06:02 marostegui: Move labsdb1010 under db2094:3318 - [[phab:T232446|T232446]]
* 02:59 ejegg: updated Fundraising CiviCRM from {{Gerrit|b9d1acdb6d}} to {{Gerrit|88c72e39ca}}
* 01:12 jforrester@deploy1001: Synchronized wmf-config/interwiki.php: [[phab:T238803|T238803]]: Update interwiki cache (duration: 00m 56s)
* 00:59 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T238803|T238803]]: Drop ability to load SkinPerPage, EUCopyrightCampaign, and EUCopyrightCampaignSkin (duration: 00m 56s)
* 00:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T238803|T238803]]: Remove all IS config related to the fixcopyrightwiki wiki (duration: 00m 55s)
* 00:51 James_F: Ran `DELETE FROM globalimagelinks WHERE gil_wiki='fixcopyrightwiki';` - one row removed [[phab:T238803|T238803]]
* 00:51 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop trying to read wmgUseSkinPerPage or wmgUseEUCopyrightCampaign (duration: 00m 55s)
* 00:48 James_F: Confirmed not SUL entries for fixcopyrightwiki as expected [[phab:T238803|T238803]]
* 00:47 jforrester@deploy1001: Synchronized static/images/project-logos/: [[phab:T238803|T238803]]: Remove fixcopyrightwiki project logos (duration: 00m 56s)
* 00:46 ejegg: updated Fundraising CiviCRM from {{Gerrit|87b13fd3b5}} to {{Gerrit|b9d1acdb6d}}
* 00:46 jforrester@deploy1001: Synchronized dblists/: [[phab:T238803|T238803]]: Remove fixcopyrightwiki from dblists in general (duration: 00m 58s)
* 00:45 jforrester@deploy1001: rebuilt and synchronized wikiversions files: [[phab:T238803|T238803]]: Remove fixcopyrightwiki from wikiversions
* 00:43 jforrester@deploy1001: Synchronized dblists/all.dblist: [[phab:T238803|T238803]]: Remove fixcopyrightwiki from all.dblist (duration: 00m 56s)
* 00:39 jforrester@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 00:38 ejegg: disabled recurring donation charge jobs for CiviCRM update
* 00:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMaxGeneratedPPNodeCount or wgParserConf::preprocessorClass, never read (duration: 00m 56s)
* 00:23 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T245983|T245983]] Read wmgApprovedContentSecurityPolicyDomains for CSP (duration: 00m 56s)
* 00:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245983|T245983]] Set wmgApprovedContentSecurityPolicyDomains (duration: 00m 57s)


== 2020-02-24 ==
== 2023-05-26 ==
* 22:58 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:48 tzatziki: removing 2 files for legal compliance
* 22:38 XioNoX: redirect ns2 to authdns1001
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 22:34 mutante: stat1007  sudo systemctl reset-failed to clear Icinga alerts about reportupdater-pingback.service
* 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 22:22 XioNoX: disable transits on cr3-esams
* 20:47 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 21:43 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@f87bdd9]: Take service name into account for consumer group name [[phab:T244387|T244387]] (duration: 01m 14s)
* 20:47 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 21:42 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@f87bdd9]: Take service name into account for consumer group name [[phab:T244387|T244387]]
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:37 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:28 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 19:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 21:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 21:23 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 19:15 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 21:05 eileen: civicrm revision changed from {{Gerrit|fffc215e75}} to {{Gerrit|87b13fd3b5}}, config revision is {{Gerrit|561ae21f77}}
* 19:15 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 20:58 XioNoX: test flowspec BGP config on cr3-knams
* 18:26 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 20:32 XioNoX: load new FW policies on pfw3-eqiad/codfw - [[phab:T246036|T246036]]
* 17:38 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 10s)
* food: updated Fundraising CiviCRM from {{Gerrit|426e3547ca}} to {{Gerrit|fffc215e75}}
* 17:31 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 20:03 eileen: civicrm revision changed from {{Gerrit|c086fd4e0b}} to {{Gerrit|426e3547ca}}, config revision is {{Gerrit|561ae21f77}}
* 16:37 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard2003.codfw.wmnet with OS bookworm
* 20:02 mutante: installing OS on new ganeti VMs apt1001 and apt2001.wikimedia.org for buster APT repos
* 16:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard1003.eqiad.wmnet with OS bookworm
* 19:07 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Changes here areonly used in tests right now, but keep line numbers sync'ed (duration: 00m 56s)
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:46 mutante: deploying cluster apache config change - adds gr.wikimedia.org vhost and refreshes apache2
* 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 17:10 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: Sync doc-only change; should be a no-op (duration: 00m 57s)
* 15:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
* 16:16 jynus: reloading ferm on ms-be2028 DNS query timed out
* 15:50 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 16:11 jynus: reloading ferm on ms-be2043 DNS query timed out
* 15:41 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
* 16:02 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]], take II (duration: 00m 56s)
* 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:57 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]] (duration: 00m 56s)
* 15:40 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
* 15:30 moritzm: updated component/jdk8 to 8u242-b08-1~deb10u1 (forward port of latest Java 8 security update)
* 15:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce weight for db1126, increase it a bit for db1101:3318', diff saved to https://phabricator.wikimedia.org/P10498 and previous config saved to /var/cache/conftool/dbconfig/20200224-152132-marostegui.json
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 15:05 marostegui: Deploy schema change on db1086 (s7 master) with replication - [[phab:T245925|T245925]]
* 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 14:59 marostegui: read_only=0 on es1020 (es4) and es1023 (es5) - unused new external store masters - [[phab:T245806|T245806]]
* 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 14:56 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]], take II (duration: 00m 55s)
* 15:31 nskaggs@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
* 14:55 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]] (duration: 00m 57s)
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:47 andrew@deploy1001: Finished deploy [horizon/deploy@dab0ca0]: modest css change for the hiera editing dialog (take two -- I consistently forget to rebase before doing this) (duration: 03m 33s)
* 15:08 nskaggs@cumin1001: START - Cookbook sre.wikireplicas.update-views
* 14:44 andrew@deploy1001: Started deploy [horizon/deploy@dab0ca0]: modest css change for the hiera editing dialog (take two -- I consistently forget to rebase before doing this)
* 14:26 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=videoscaler,dc=eqiad,name=parse.*
* 14:43 andrew@deploy1001: Finished deploy [horizon/deploy@a8f2ea9]: modest css change for the hiera editing dialog (duration: 00m 12s)
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=parse.*
* 14:43 andrew@deploy1001: Started deploy [horizon/deploy@a8f2ea9]: modest css change for the hiera editing dialog
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 14:42 marostegui: Compress innodb on wb_terms on db1087 - [[phab:T232446|T232446]]
* 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
* 14:03 _joe_: depooling esams (authdns-update)
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1003.eqiad.wmnet
* 13:51 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q120K (T219123)]], take II (duration: 00m 55s)
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 13:48 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q120K (T219123)]] (duration: 00m 56s)
* 14:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 13:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q60K (T219123)]], take II (duration: 00m 56s)
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard1003.eqiad.wmnet on all recursors
* 13:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q60K (T219123)]] (duration: 00m 56s)
* 14:06 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard1003.eqiad.wmnet on all recursors
* 13:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q30K (T219123)]], take II (duration: 00m 56s)
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q30K (T219123)]] (duration: 00m 56s)
* 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 13:05 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 18s)
* 14:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
* 13:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573965{{!}}Disallow crats to (un)assign flow-bot group on enwiki (T245716)]] (duration: 00m 56s)
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2003.codfw.wmnet
* 12:59 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573965{{!}}Disallow crats to (un)assign flow-bot group on enwiki (T245716)]] (duration: 00m 56s)
* 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 12:48 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:574398{{!}} Bumping portals to master (563985)]] (duration: 00m 56s)
* 14:03 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 12:47 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:574398{{!}} Bumping portals to master (563985)]] (duration: 00m 56s)
* 14:02 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 12:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:571738{{!}}Add definitions for redirect badges (T235420)]], take II, the cache issue (duration: 00m 56s)
* 14:02 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetboard1003.eqiad.wmnet
* 12:37 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:571738{{!}}Add definitions for redirect badges (T235420)]] (duration: 00m 56s)
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard2003.codfw.wmnet on all recursors
* 12:23 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/client/includes: SWAT: [[gerrit:574391{{!}}Use formatter cache in client LUA label lookups (T245740)]] (duration: 00m 56s)
* 14:02 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard2003.codfw.wmnet on all recursors
* 12:19 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/dumpInterwiki.php: dumpInterwiki: Respect comments in dblists ([[phab:T244906|T244906]]) (duration: 00m 56s)
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}574265{{!}}CX: Adjust MT threshold for Telugu WP to 70% (T244769)]] (duration: 00m 56s)
* 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 12:05 XioNoX: re-enable deactivated BGP sessions from ulsfo to office - [[phab:T239893|T239893]]
* 14:01 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
* 12:02 vgutierrez: reimage pybal-test2001 as buster - [[phab:T224570|T224570]] [[phab:T245984|T245984]]
* 13:58 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 11:49 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:574398{{!}} Bumping portals to master (563985)]] (duration: 00m 55s)
* 13:58 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetboard2003.codfw.wmnet
* 11:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:574398{{!}} Bumping portals to master (563985)]] (duration: 00m 57s)
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 11:27 vgutierrez: upload pybal 1.15.8 to apt.wm.o (buster) - [[phab:T245984|T245984]]
* 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:06 volans: restarted ferm on ms-be2046
* 13:56 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 11:02 marostegui: Move labsdb1009, labsdb1011 and labsdb1012 (labsdb1010 is currently delayed, will be done later) to replicate under codfw for a few days while we alter wb_terms on db1087 - [[phab:T232446|T232446]]
* 13:56 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 10:59 effie: upgrading scap in eqiad and codfw - [[phab:T245530|T245530]]
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb1003.eqiad.wmnet
* 10:55 volans: restarted ferm on ms-be2016, had failed with DNS query for 'ms-be2056.codfw.wmnet' failed: query timed out
* 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:41 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase: [[gerrit:574386{{!}}Add metric for recording cache hits in StatsdRecordingSimpleCache]] ([[phab:T244260|T244260]]) (duration: 01m 04s)
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
* 10:34 godog: onboard netbox to logging pipeline
* 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:12 marostegui: Stop db1087 and db2079 in sync - [[phab:T232446|T232446]]
* 13:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for compression and place db1101:3318 into vslow,dump - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10493 and previous config saved to /var/cache/conftool/dbconfig/20200224-101030-marostegui.json
* 13:51 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 09:21 godog: bounce ferm on ms-be2023, it had failed (no entries in journald)
* 13:46 jbond@cumin2002: START - Cookbook sre.dns.netbox
* 09:08 elukey: update puppet compiler's facts
* 13:46 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add instances to es5 eqiad - [[phab:T245806|T245806]]', diff saved to https://phabricator.wikimedia.org/P10492 and previous config saved to /var/cache/conftool/dbconfig/20200224-084027-marostegui.json
* 13:45 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 08:34 marostegui@deploy1001: Synchronized wmf-config/etcd.php: Add es4 and es5 (unused new external store sections to etcd - [[phab:T245806|T245806]] (duration: 00m 58s)
* 13:45 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetdb1003.eqiad.wmnet
* 08:29 marostegui: Temporary put es1020 (es4) and es1023 (es5) on RO on a mysql level - [[phab:T245806|T245806]]
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add instances to es5 codfw - [[phab:T245806|T245806]]', diff saved to https://phabricator.wikimedia.org/P10491 and previous config saved to /var/cache/conftool/dbconfig/20200224-082848-marostegui.json
* 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Add instances to es4 eqiad - [[phab:T245806|T245806]]', diff saved to https://phabricator.wikimedia.org/P10490 and previous config saved to /var/cache/conftool/dbconfig/20200224-080708-marostegui.json
* 13:12 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Add instances to es4 codfw - [[phab:T245806|T245806]]', diff saved to https://phabricator.wikimedia.org/P10489 and previous config saved to /var/cache/conftool/dbconfig/20200224-080128-marostegui.json
* 13:06 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 07:31 cdanis: dbctl: edit es4/es5 sections in eqiad (flavor & master & min_replicas fields) [[phab:T245806|T245806]]
* 12:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
* 07:30 cdanis: dbctl: (and min_replicas field) [[phab:T245806|T245806]]
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:29 cdanis: dbctl: edit es4/es5 sections in codfw (flavor & master fields) [[phab:T245806|T245806]]
* 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1107 for 10.4 testing in special slaves group with weight 10 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10488 and previous config saved to /var/cache/conftool/dbconfig/20200224-071201-marostegui.json
* 12:41 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing in main and API - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10487 and previous config saved to /var/cache/conftool/dbconfig/20200224-070337-marostegui.json
* 12:39 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318 after removing partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10486 and previous config saved to /var/cache/conftool/dbconfig/20200224-064044-marostegui.json
* 12:21 hashar@deploy1002: Finished deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318 after removing partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10485 and previous config saved to /var/cache/conftool/dbconfig/20200224-063258-marostegui.json
* 12:21 hashar@deploy1002: Started deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis {{!}} [[phab:T332474|T332474]]
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318 after removing partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10484 and previous config saved to /var/cache/conftool/dbconfig/20200224-062226-marostegui.json
* 11:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318 after removing partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10483 and previous config saved to /var/cache/conftool/dbconfig/20200224-060118-marostegui.json
* 11:35 hashar@deploy1002: Finished deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]] (duration: 00m 08s)
* 05:57 marostegui: Repool labsdb1011 - [[phab:T245797|T245797]]
* 11:35 hashar@deploy1002: Started deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing {{!}} [[phab:T332474|T332474]]
* 10:54 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 10:54 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 10:38 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
* 10:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
* 09:54 effie: pool parse1013-parse1016 to the jobrunner cluster  - [[phab:T329366|T329366]]
* 09:29 jbond: disable puppet fleet wide to deploy minor puppet change https://gerrit.wikimedia.org/r/c/operations/puppet/+/923353
* 09:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1016.eqiad.wmnet with OS buster
* 09:26 effie: parse1013-parse1016 have neen depooled and removed from the parsoid-php service - [[phab:T329366|T329366]]
* 09:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1014.eqiad.wmnet with OS buster
* 09:24 jnuche@deploy1002: Installation of scap version "4.52.3" completed for 596 hosts
* 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1013.eqiad.wmnet with OS buster
* 09:23 jnuche@deploy1002: Installing scap version "4.52.3" for 596 hosts
* 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 09:13 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 09:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parse1015.eqiad.wmnet with OS buster
* 08:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 08:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 08:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 08:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
* 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
* 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1016.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1015.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1014.eqiad.wmnet with OS buster
* 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1013.eqiad.wmnet with OS buster
* 08:10 jiji@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse101[3-6].eqiad.wmnet
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48591 and previous config saved to /var/cache/conftool/dbconfig/20230526-075903-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48590 and previous config saved to /var/cache/conftool/dbconfig/20230526-075809-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48589 and previous config saved to /var/cache/conftool/dbconfig/20230526-074358-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48588 and previous config saved to /var/cache/conftool/dbconfig/20230526-074304-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48587 and previous config saved to /var/cache/conftool/dbconfig/20230526-072854-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48586 and previous config saved to /var/cache/conftool/dbconfig/20230526-072759-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48585 and previous config saved to /var/cache/conftool/dbconfig/20230526-071349-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48584 and previous config saved to /var/cache/conftool/dbconfig/20230526-071255-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48583 and previous config saved to /var/cache/conftool/dbconfig/20230526-065844-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48582 and previous config saved to /var/cache/conftool/dbconfig/20230526-065750-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48581 and previous config saved to /var/cache/conftool/dbconfig/20230526-064340-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48580 and previous config saved to /var/cache/conftool/dbconfig/20230526-064245-root.json
* 06:42 elukey: `apt-get clean` on stat1008 to clean up some space in the root partition
* 06:36 elukey: `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48579 and previous config saved to /var/cache/conftool/dbconfig/20230526-062835-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48578 and previous config saved to /var/cache/conftool/dbconfig/20230526-062741-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48577 and previous config saved to /var/cache/conftool/dbconfig/20230526-061330-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48576 and previous config saved to /var/cache/conftool/dbconfig/20230526-061236-root.json
* 03:51 fab@deploy1002: Finished deploy [airflow-dags/research@77cf676]: (no justification provided) (duration: 00m 17s)
* 03:51 fab@deploy1002: Started deploy [airflow-dags/research@77cf676]: (no justification provided)


== 2020-02-23 ==
== 2023-05-25 ==
* 16:52 elukey: powercycle mw1372 - no mgmt console, no ssh
* 22:14 zabe@deploy1002: Finished scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] (duration: 09m 14s)
* 15:17 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='𐰇𐱅𐰚𐰤' /home/urbanecm/T245950 ([[phab:T245950|T245950]])
* 22:07 zabe@deploy1002: zabe and ladsgroup: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 22:05 zabe@deploy1002: Started scap: Backport for [[gerrit:923283{{!}}Replace deprecated Hooks::runWithoutAbort (T335536)]], [[gerrit:923276{{!}}BannerRenderer: Make sure the language variant is valid (T337427)]]
* 21:26 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) (duration: 00m 08s)
* 21:25 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@77cf676]: (no justification provided)
* 20:47 TheresNoTime: close UTC late backport
* 20:47 samtar@deploy1002: Finished scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] (duration: 08m 34s)
* 20:40 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:38 samtar@deploy1002: Started scap: Backport for [[gerrit:923282{{!}}Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)]]
* 20:32 samtar@deploy1002: Finished scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] (duration: 10m 58s)
* 20:22 samtar@deploy1002: jdrewniak and samtar: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:21 samtar@deploy1002: Started scap: Backport for [[gerrit:923281{{!}}Use document feature classes to extract A/B test state (T335972)]]
* 20:13 samtar@deploy1002: Finished scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] (duration: 08m 31s)
* 20:06 samtar@deploy1002: samtar and daimona: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:05 samtar@deploy1002: Started scap: Backport for [[gerrit:919838{{!}}[prod] Configure logging for the CampaignEvents channel (T337365)]]
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 19:31 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
* 19:29 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48575 and previous config saved to /var/cache/conftool/dbconfig/20230525-190946-root.json
* 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48574 and previous config saved to /var/cache/conftool/dbconfig/20230525-190859-root.json
* 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48573 and previous config saved to /var/cache/conftool/dbconfig/20230525-185441-root.json
* 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48572 and previous config saved to /var/cache/conftool/dbconfig/20230525-185354-root.json
* 18:43 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@6b27584]: (no justification provided) (duration: 00m 19s)
* 18:43 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@6b27584]: (no justification provided)
* 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48571 and previous config saved to /var/cache/conftool/dbconfig/20230525-183937-root.json
* 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48570 and previous config saved to /var/cache/conftool/dbconfig/20230525-183849-root.json
* 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48568 and previous config saved to /var/cache/conftool/dbconfig/20230525-182432-root.json
* 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48567 and previous config saved to /var/cache/conftool/dbconfig/20230525-182345-root.json
* 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48566 and previous config saved to /var/cache/conftool/dbconfig/20230525-180927-root.json
* 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48565 and previous config saved to /var/cache/conftool/dbconfig/20230525-180840-root.json
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48564 and previous config saved to /var/cache/conftool/dbconfig/20230525-175423-root.json
* 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48563 and previous config saved to /var/cache/conftool/dbconfig/20230525-175335-root.json
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48562 and previous config saved to /var/cache/conftool/dbconfig/20230525-173918-root.json
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48561 and previous config saved to /var/cache/conftool/dbconfig/20230525-173831-root.json
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 17:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48559 and previous config saved to /var/cache/conftool/dbconfig/20230525-172413-root.json
* 17:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48558 and previous config saved to /var/cache/conftool/dbconfig/20230525-172326-root.json
* 17:15 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:14 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:14 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
* 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
* 17:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
* 17:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
* 17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 17:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 16:39 topranks: adding outbound shaper config on eqsin to codfw transport cct ([[phab:T328313|T328313]])
* 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48557 and previous config saved to /var/cache/conftool/dbconfig/20230525-163657-ladsgroup.json
* 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48556 and previous config saved to /var/cache/conftool/dbconfig/20230525-162151-ladsgroup.json
* 16:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:14 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:14 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* 16:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
* 16:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 16:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
* 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48555 and previous config saved to /var/cache/conftool/dbconfig/20230525-160645-ladsgroup.json
* 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
* 15:57 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 15:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48553 and previous config saved to /var/cache/conftool/dbconfig/20230525-155139-ladsgroup.json
* 15:49 dancy@deploy1002: Finished deploy [integration/docroot@dac2b70]: Updated Scap URLs (duration: 00m 07s)
* 15:49 dancy@deploy1002: Started deploy [integration/docroot@dac2b70]: Updated Scap URLs
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154927-ladsgroup.json
* 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20230525-154906-ladsgroup.json
* 15:44 dancy: dancy@deploy1002 Updated scap URLs on doc.wikimedia.org
* 15:43 dancy@deploy1002: Finished deploy [integration/docroot@78e6f40]: (no justification provided) (duration: 00m 10s)
* 15:43 dancy@deploy1002: Started deploy [integration/docroot@78e6f40]: (no justification provided)
* 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48552 and previous config saved to /var/cache/conftool/dbconfig/20230525-153359-ladsgroup.json
* 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:33 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 15:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
* 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
* 15:27 kartik@deploy1002: Finished scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 07m 01s)
* 15:22 kartik@deploy1002: kartik: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 15:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
* 15:20 kartik@deploy1002: Started scap: Backport for [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48551 and previous config saved to /var/cache/conftool/dbconfig/20230525-151853-ladsgroup.json
* 15:18 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 68m 07s)
* 15:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
* 15:10 topranks: Migrating cr1-eqiad downlink to row E/F from lsw1-e1-eqiad et-0/0/48 to ssw1-e1-eqiad et-0/0/31
* 15:10 mutante: gerrit-replica.wikimedia.org - gerrit2002 - reimaging - scheduled maintenance
* 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
* 15:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
* 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48550 and previous config saved to /var/cache/conftool/dbconfig/20230525-150347-ladsgroup.json
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48549 and previous config saved to /var/cache/conftool/dbconfig/20230525-145857-ladsgroup.json
* 14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48548 and previous config saved to /var/cache/conftool/dbconfig/20230525-145836-ladsgroup.json
* 14:54 marostegui: Wikireplicas are lagging behind for the following sections: s1, s2, s5, s7 [[phab:T337446|T337446]]
* 14:54 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48547 and previous config saved to /var/cache/conftool/dbconfig/20230525-144330-ladsgroup.json
* 14:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
* 14:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1026']
* 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1027']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1025']
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1024']
* 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48546 and previous config saved to /var/cache/conftool/dbconfig/20230525-142824-ladsgroup.json
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1025']
* 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1024']
* 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
* 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
* 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
* 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
* 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver
* 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad
* 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
* 14:20 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 14:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=parsoid-php,dc=eqiad
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48545 and previous config saved to /var/cache/conftool/dbconfig/20230525-141318-ladsgroup.json
* 14:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:11 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:10 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:08 volans@cumin1001: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 ([[phab:T336886|T336886]])', diff saved to https://phabricator.wikimedia.org/P48544 and previous config saved to /var/cache/conftool/dbconfig/20230525-140822-ladsgroup.json
* 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 14:08 kartik@deploy1002: Finished scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] (duration: 15m 56s)
* 13:53 kartik@deploy1002: kartik: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:52 kartik@deploy1002: Started scap: Backport for [[gerrit:923268{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]], [[gerrit:923269{{!}}Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)]]
* 13:46 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]] (duration: 07m 42s)
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:38 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923252{{!}}Change maint script to do work via jobs]]
* 13:28 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] (duration: 09m 06s)
* 13:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:20 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:19 urbanecm@deploy1002: Started scap: Backport for [[gerrit:923273{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]], [[gerrit:923274{{!}}Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)]]
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s5, s2, s7', diff saved to https://phabricator.wikimedia.org/P48538 and previous config saved to /var/cache/conftool/dbconfig/20230525-121012-root.json
* 11:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 11:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 11:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:43 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 11:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48537 and previous config saved to /var/cache/conftool/dbconfig/20230525-113914-root.json
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 11:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 11:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 11:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48536 and previous config saved to /var/cache/conftool/dbconfig/20230525-112409-root.json
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
* 11:22 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
* 11:21 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
* 11:20 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
* 11:15 jbond: update udplog on mwlog server
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48535 and previous config saved to /var/cache/conftool/dbconfig/20230525-110948-root.json
* 11:09 jbond: upload udplog_1.10_amd64.deb
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48534 and previous config saved to /var/cache/conftool/dbconfig/20230525-110905-root.json
* 11:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 11:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 11:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 10:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48533 and previous config saved to /var/cache/conftool/dbconfig/20230525-105443-root.json
* 10:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
* 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
* 10:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48532 and previous config saved to /var/cache/conftool/dbconfig/20230525-105400-root.json
* 10:53 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
* 10:52 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
* 10:49 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:48 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2005-dev.wikimedia.org
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48531 and previous config saved to /var/cache/conftool/dbconfig/20230525-103939-root.json
* 10:39 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48530 and previous config saved to /var/cache/conftool/dbconfig/20230525-103855-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48529 and previous config saved to /var/cache/conftool/dbconfig/20230525-103445-root.json
* 10:32 aborrero@cumin2002: START - Cookbook sre.dns.netbox
* 10:24 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2005-dev.wikimedia.org
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48528 and previous config saved to /var/cache/conftool/dbconfig/20230525-102434-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48527 and previous config saved to /var/cache/conftool/dbconfig/20230525-102351-root.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48526 and previous config saved to /var/cache/conftool/dbconfig/20230525-101940-root.json
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48525 and previous config saved to /var/cache/conftool/dbconfig/20230525-100927-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48524 and previous config saved to /var/cache/conftool/dbconfig/20230525-100846-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48523 and previous config saved to /var/cache/conftool/dbconfig/20230525-100436-root.json
* 10:00 kart_: Updated cxserver to 2023-05-25-093623-production (config: language pairs transform fix + [[phab:T331201|T331201]])
* 09:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 09:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48522 and previous config saved to /var/cache/conftool/dbconfig/20230525-095423-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48521 and previous config saved to /var/cache/conftool/dbconfig/20230525-095341-root.json
* 09:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 09:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48520 and previous config saved to /var/cache/conftool/dbconfig/20230525-094931-root.json
* 09:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 09:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48519 and previous config saved to /var/cache/conftool/dbconfig/20230525-093918-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48518 and previous config saved to /var/cache/conftool/dbconfig/20230525-093426-root.json
* 09:32 apergos: running from dumpsdata1004 via ariel login screen session, as root, rsync with bwlimit 100000  to dumpsdata1006, copying all public xml dumps data
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48517 and previous config saved to /var/cache/conftool/dbconfig/20230525-092413-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48516 and previous config saved to /var/cache/conftool/dbconfig/20230525-091922-root.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2179', diff saved to https://phabricator.wikimedia.org/P48515 and previous config saved to /var/cache/conftool/dbconfig/20230525-091132-root.json
* 09:10 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48514 and previous config saved to /var/cache/conftool/dbconfig/20230525-090417-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48513 and previous config saved to /var/cache/conftool/dbconfig/20230525-084912-root.json
* 08:32 elukey: revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - [[phab:T337248|T337248]]
* 07:52 matthiasmullie: UTC morning backports done
* 07:51 mlitn@deploy1002: Finished scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] (duration: 16m 12s)
* 07:37 mlitn@deploy1002: mlitn: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:35 mlitn@deploy1002: Started scap: Backport for [[gerrit:922853{{!}}Change maint script to do work via jobs (T322872)]]
* 07:18 mlitn@deploy1002: Finished scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] (duration: 14m 02s)
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P48511 and previous config saved to /var/cache/conftool/dbconfig/20230525-071719-root.json
* 07:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 07:06 mlitn@deploy1002: mlitn: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:04 mlitn@deploy1002: Started scap: Backport for [[gerrit:921561{{!}}[WikibaseMediaInfo] Add 'main subject of' property]]
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1196', diff saved to https://phabricator.wikimedia.org/P48509 and previous config saved to /var/cache/conftool/dbconfig/20230525-064418-root.json
* 06:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P48506 and previous config saved to /var/cache/conftool/dbconfig/20230525-055734-root.json
* 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: [[phab:T337446|T337446]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P48504 and previous config saved to /var/cache/conftool/dbconfig/20230525-055236-root.json
* 05:48 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 05:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110', diff saved to https://phabricator.wikimedia.org/P48503 and previous config saved to /var/cache/conftool/dbconfig/20230525-051923-root.json
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}
* 02:14 eileen: civicrm upgraded from {{Gerrit|b8cab6f6}} to {{Gerrit|415aa7e5}}


== 2020-02-22 ==
== 2023-05-24 ==
* 03:41 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:18 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] (duration: 09m 40s)
* 03:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:10 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 02:17 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:08 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922921{{!}}[Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)]]
* 02:16 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:55 samtar@deploy1002: Finished scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 15s)
* 02:13 mutante: ganeti - removing instances apt1001/apt2001 again, starting over
* 20:48 samtar@deploy1002: samtar: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 01:53 mutante: starting new ganeti VMs apt1001 and apt2001 for OS install (WIP, not prod)
* 20:47 samtar@deploy1002: Started scap: Backport for [[gerrit:922855{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 01:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] (duration: 08m 31s)
* 01:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:18 samtar@deploy1002: samtar: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 00:45 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:922854{{!}}ipInfo.hooks: Use wgRelevantUserName (T337373)]]
* 00:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:08 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
* 00:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:12 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 19:06 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9  refs [[phab:T330216|T330216]]
* 18:55 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 06m 00s)
* 18:49 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:22 ejegg: civicrm upgraded from {{Gerrit|4251dfa1}} to {{Gerrit|b8cab6f6}}
* 16:54 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance (duration: 00m 09s)
* 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@1603ecf]: Deploying [[phab:T336800|T336800]] on platform_eng Airflow instance
* 16:05 elukey: move kafka mirror on kafka main brokers to PKI - [[phab:T337248|T337248]]
* 16:01 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] (duration: 08m 33s)
* 15:56 elukey: move kafka mirror on kafka jumbo brokers to PKI - [[phab:T337248|T337248]]
* 15:54 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 15:52 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922852{{!}}Personalized praise: Add instrumentation (T325117)]], [[gerrit:922851{{!}}Personalized praise: Add instrumentation (T325117)]]
* 15:47 ejegg: payments-wiki upgraded from {{Gerrit|e02bc7c5}} to {{Gerrit|c2f9f8b5}}
* 15:39 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363] (duration: 01m 35s)
* 15:38 ejegg: standalone SmashPig upgraded from {{Gerrit|5460dbe2}} to {{Gerrit|db23b998}}
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363]
* 15:37 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363] (duration: 00m 04s)
* 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363]
* 15:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:32 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:31 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:31 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363] (duration: 06m 13s)
* 15:31 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:30 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:26 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:25 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:25 aqu@deploy1002: Started deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363]
* 15:24 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:18 aqu: analytics-refinery, about to deploy
* 15:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
* 14:30 volans@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
* 14:29 volans@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P<nowiki>{</nowiki>puppetboard2002.codfw.wmnet<nowiki>}</nowiki> and (A:puppetboard)
* 14:26 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
* 14:26 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
* 14:19 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] (duration: 12m 11s)
* 14:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]] (duration: 00m 07s)
* 14:13 hashar@deploy1002: Started deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation {{!}} [[phab:T332474|T332474]]
* 14:08 urbanecm@deploy1002: urbanecm and matmarex: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:06 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922838{{!}}Enable DiscussionTools newtopictool on fiwiki (T317375)]]
* 14:06 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] (duration: 09m 21s)
* 13:58 urbanecm@deploy1002: matmarex and urbanecm and sgimeno: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]] synced t
* 13:56 urbanecm@deploy1002: Started scap: Backport for [[gerrit:922405{{!}}MultiPaneDialog: remove attribute hidden instead of class (T337256)]], [[gerrit:920238{{!}}Add maint script to opt out active users from the new topic tool (T317375)]], [[gerrit:920731{{!}}Define $maintClass in maintenance script for compatibility (T317375)]], [[gerrit:920733{{!}}NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)]]
* 13:55 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]] (duration: 07m 06s)
* 13:48 urbanecm@deploy1002: Started scap: Backport for [[gerrit:918500{{!}}[Growth] Add mediawiki.mentor_dashboard.interaction (T325117)]]
* 13:36 samtar@deploy1002: Finished scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] (duration: 08m 04s)
* 13:29 samtar@deploy1002: samtar and wmde-fisch: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:28 samtar@deploy1002: Started scap: Backport for [[gerrit:922810{{!}}Enable Kartographer Nearby on remaining wikis (T336834)]]
* 13:26 samtar@deploy1002: Finished scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] (duration: 10m 15s)
* 13:17 samtar@deploy1002: samtar and dcausse: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:16 samtar@deploy1002: Started scap: Backport for [[gerrit:801792{{!}}[cirrus] Fix typo in config var]]
* 13:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] (duration: 07m 53s)
* 13:07 samtar@deploy1002: herron and samtar: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:07 xSavitar: tools.codesearch Deployed https://gerrit.wikimedia.org/r/c/labs/codesearch/+/909258 and also restarted tool instances to core search backend was dead.
* 13:06 samtar@deploy1002: Started scap: Backport for [[gerrit:920298{{!}}arclamp: switch redis server to arclamp1001 (T327277)]]
* 12:55 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript findBadBlobs --wiki nowiki --revisions {{Gerrit|5227369}} --mark [[phab:T337392|T337392]]` [[phab:T337392|T337392]]
* 12:47 tgr_: running changeWikiConfig.php on Growth pilot wikis for [[phab:T337348|T337348]]
* 10:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes
* 09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet
* 09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet
* 09:04 dcausse@deploy1002: Finished deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset (duration: 00m 17s)
* 09:04 dcausse@deploy1002: Started deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset
* 08:52 claime: repooling mw2248.codfw.wmnet - [[phab:T334429|T334429]]
* 08:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes
* 08:50 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
* 08:49 marostegui: Stop mariadb on db1154 (sanitarium) there will be lag on clouddb* hosts
* 08:36 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]] (duration: 07m 20s)
* 08:28 urbanecm@deploy1002: Started scap: Backport for [[gerrit:921599{{!}}Migrate GrowthExperiments config to its own file (T308932)]]
* 07:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 07:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 07:41 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 07:40 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
* 05:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
* 01:19 mutante: contint2001 - jenkins started again
* 01:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 01:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:45 mutante: short maintenance on main contint server (jenkins)
* 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint1002.wikimedia.org with reason: maintenance


== 2020-02-21 ==
== 2023-05-23 ==
* 23:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done
* 23:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 23:05 andrewbogott: updated (?) wikitech-static to 1.34.0
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
* 22:01 sbassett@deploy1001: Finished scap: Deploy security fix for [[phab:T232932|T232932]] (duration: 05m 35s)
* 23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance
* 21:56 sbassett@deploy1001: Started scap: Deploy security fix for [[phab:T232932|T232932]]
* 23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - [[phab:T324659|T324659]]
* 21:53 andrew@deploy1001: Finished deploy [horizon/deploy@a8f2ea9]: added a warning about the public git history to the hiera edit panel -- take two (duration: 03m 41s)
* 22:00 eileen: civicrm upgraded from {{Gerrit|11538e23}} to {{Gerrit|4251dfa1}}
* 21:49 andrew@deploy1001: Started deploy [horizon/deploy@a8f2ea9]: added a warning about the public git history to the hiera edit panel -- take two
* 21:26 ejegg: payments-wiki upgraded from {{Gerrit|a7567c6a}} to {{Gerrit|e02bc7c5}}
* 21:45 andrew@deploy1001: Finished deploy [horizon/deploy@13ca90a]: added a warning about the public git history to the hiera edit panel (duration: 00m 11s)
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:45 andrew@deploy1001: Started deploy [horizon/deploy@13ca90a]: added a warning about the public git history to the hiera edit panel
* 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:23 mutante: LDAP - added ldickinson to wmf
* 21:02 TheresNoTime: close UTC late backport window
* 21:23 mutante: LDAP - added dduvall to archiva-deployers
* 21:01 samtar@deploy1002: Finished scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] (duration: 11m 47s)
* 21:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:20 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 21:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:51 samtar@deploy1002: ksarabia and samtar: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:50 samtar@deploy1002: Started scap: Backport for [[gerrit:922572{{!}}Turn on the A/B test for testwiki (T336969)]]
* 20:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:48 samtar@deploy1002: Finished scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] (duration: 11m 20s)
* 20:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:38 samtar@deploy1002: samtar: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:29 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:37 ejegg: civicrm upgraded from {{Gerrit|efe25c9b}} to {{Gerrit|11538e23}}
* 20:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:37 samtar@deploy1002: Started scap: Backport for [[gerrit:922397{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]], [[gerrit:922398{{!}}Remove centraluserid dependency in ABRequirement.php (T336969)]]
* 18:34 XioNoX: re-enable GRE tunnels on cr3-esams - [[phab:T245825|T245825]]
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:55 XioNoX: add gobgpd to buster-wikimedia repo
* 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:06 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:38 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/includes/resourceloader/ResourceLoaderSkinModule.php: [[phab:T245778|T245778]] [[phab:T245182|T245182]] [[phab:T232140|T232140]] (duration: 01m 00s)
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:29 mark: cr3-esams: Shutdown GRE tunnels over Telia
* 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:27 akosiaris: repool mathoid at eqiad, test complete
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:27 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:20 moritzm: rebooting boron
* 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:20 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:17 moritzm: bumped memory for boron.eqiad.wmnet to 16G
* 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:04 mark: cr3-esams: request chassis fpc offline slot 1
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:57 mark: Disabled Telia transit on cr3-esams
* 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:57 mark: Set VRRP prio cost to 50 on cr3-esams to make it backup VRRP
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:48 elukey: restart varnishkafka-webrequest on cp3052 (stuck in timeouts to kafka, analytics alarms raised)
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:47 elukey: restart varnishkafka-webrequest on cp3056/cp3058/cp3054/cp3064 (stuck in timeouts to kafka, analytics alarms raised)
* 19:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 11:39 elukey: restart varnishkafka on cp3057 (stuck in timeouts to kafka, analytics alarms raised)
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:21 godog: bounce logstash on logstash1023 - see if can catch up with elastic7 kafka lag
* 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:14 elukey: reboot stat1005 - GPU blocked at 100% after issue with tensorflow
* 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:18 akosiaris: depool mathoid in eqiad for a test
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:18 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
* 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10473 and previous config saved to /var/cache/conftool/dbconfig/20200221-085405-marostegui.json
* 19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 08:34 fdans@deploy1001: Finished deploy [analytics/refinery@4d56021]: deploying refinery (duration: 14m 55s)
* 19:42 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
* 08:19 fdans@deploy1001: Started deploy [analytics/refinery@4d56021]: deploying refinery
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:02 akosiaris: disable mod_remoteip on otrs host, following merge of https://gerrit.wikimedia.org/r/573877
* 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 06:58 marostegui: Stop MySQL on labsdb1012 to clone labsdb1011 - [[phab:T245797|T245797]]
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:58 marostegui: Stop MySQL on labsdb1012 to clone labsdb1011 -
* 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 06:34 marostegui: Stop mysql on es1024 to clone es1025 - [[phab:T243052|T243052]]
* 19:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  dbproxy102<nowiki>{</nowiki>2..7<nowiki>}</nowiki> - jclark@cumin1001"
* 05:57 marostegui: Start MySQL on labsdb1011 without replication - [[phab:T245797|T245797]]
* 19:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 05:44 marostegui: Reload haproxy on dbproxy1010, dbproxy1011, dbproxy18 - [[phab:T245797|T245797]]
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1027
* 02:53 bstorm_: depooled labsdb1011 and set weight 10 on labsdb1009 vs 3 on labsdb1010 [[phab:T245797|T245797]]
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1027
* 02:43 ejegg: updated Fundraising CiviCRM from {{Gerrit|a6b222c19f}} to {{Gerrit|c086fd4e0b}}
* 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1026
* 02:27 bstorm_: stopped mariadb on labsdb1011 because it keeps crashing anyway
* 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1026
* 01:05 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Sync Beta-Cluster-only change to CommonSettings now we're sure we won't revert (duration: 00m 56s)
* 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
* 01:04 andrew@deploy1001: Finished deploy [horizon/deploy@13ca90a]: Remove guided puppet config mode; this gets us back to working with latest puppet packages. (duration: 03m 32s)
* 19:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 01:01 andrew@deploy1001: Started deploy [horizon/deploy@13ca90a]: Remove guided puppet config mode; this gets us back to working with latest puppet packages.
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:31 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1025
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
* 19:30 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
* 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
* 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
* 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
* 19:25 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
* 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1022
* 19:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 19:24 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1022
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 18:29 inflatador: bking@cumin1001 rolling restart of codfw wdqs public hosts [[phab:T337327|T337327]]
* 18:26 ryankemper: [WDQS] [[phab:T337327|T337327]] Deployed new, hopefully-working rule after addressing previous syntax error (unescaped `"`). See `/srv/private` commit `6e2f5ab19427902994bb9d03d28277252f021474`
* 18:16 ryankemper: [WDQS] Rolled back requestctl rule
* 18:12 ryankemper: [WDQS] [[phab:T337327|T337327]] New rule in place to ban potential source of WDQS codfw outage. Rolling restart will be done in a couple minutes to [attempt to] restore service availability
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:03 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]] and [[phab:T333140|T333140]]
* 17:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
* 16:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:50 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]], part 2
* 16:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
* 16:43 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
* 16:42 sbassett: Deployed updated security mitigation for [[phab:T336027|T336027]]
* 16:41 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
* 16:31 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Rename page content change enrich error stream to match convention - [[phab:T336656|T336656]] (duration: 06m 58s)
* 16:22 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]] (duration: 36m 02s)
* 15:56 topranks: moving lvs1018 connection to rack E1 from lsw1-e1-eqiad to ssw1-e1-eqiad [[phab:T322937|T322937]]
* 15:46 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys [[phab:T322937|T322937]]
* 15:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:45 sukhe: stop pybal on lvs1018: [[phab:T322937|T322937]]
* 15:38 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bullseye
* 15:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:22 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:20 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
* 15:20 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:19 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 15:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bullseye
* 15:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases1003.eqiad.wmnet with OS bullseye
* 15:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:00 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes
* 14:58 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:58 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:51 moritzm: removed imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org/buster-wikimedia now that the Thumbor spec tests have been upgraded to match latest patches
* 14:49 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:46 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
* 14:36 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases1003.eqiad.wmnet with OS bullseye
* 14:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kafkamon2002.codfw.wmnet
* 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases2003.codfw.wmnet
* 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 14:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases2003.codfw.wmnet on all recursors
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
* 14:01 herron@cumin1001: START - Cookbook sre.dns.netbox
* 14:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:57 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases2003.codfw.wmnet
* 13:56 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2002.codfw.wmnet
* 13:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1002.eqiad.wmnet
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:55 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:54 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
* 13:50 herron@cumin1001: START - Cookbook sre.dns.netbox
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases1003.eqiad.wmnet
* 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:47 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases1003.eqiad.wmnet on all recursors
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:46 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1002.eqiad.wmnet
* 13:45 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
* 13:45 hoo@deploy1002: Finished scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] (duration: 12m 49s)
* 13:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 13:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 13:43 eoghan@cumin1001: START - Cookbook sre.dns.netbox
* 13:43 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases1003.eqiad.wmnet
* 13:33 hoo@deploy1002: hoo: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:32 hoo@deploy1002: Started scap: Backport for [[gerrit:922394{{!}}Restore targets declarations temporarily (T336956)]], [[gerrit:922395{{!}}Restore targets declarations temporarily (T336956)]]
* 13:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 11:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 11:55 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:40 akosiaris@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
* 10:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 10:21 akosiaris: reboot rdb1011 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:21 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 10:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
* 10:07 akosiaris: reboot rdb2009 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
* 10:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 10:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
* 09:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48493 and previous config saved to /var/cache/conftool/dbconfig/20230523-095720-root.json
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
* 09:50 stevemunene: reboot an-test-master1002.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
* 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48492 and previous config saved to /var/cache/conftool/dbconfig/20230523-094216-root.json
* 09:42 stevemunene: reboot an-test-worker1003.eqiad.wmnet December 2022 Buster reboots [[phab:T325132|T325132]]
* 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
* 09:34 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48491 and previous config saved to /var/cache/conftool/dbconfig/20230523-092711-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48490 and previous config saved to /var/cache/conftool/dbconfig/20230523-091207-root.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48489 and previous config saved to /var/cache/conftool/dbconfig/20230523-085702-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48488 and previous config saved to /var/cache/conftool/dbconfig/20230523-085246-root.json
* 08:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 08:44 hashar@deploy1002: Started deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately {{!}} [[phab:T214068|T214068]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48487 and previous config saved to /var/cache/conftool/dbconfig/20230523-084157-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48486 and previous config saved to /var/cache/conftool/dbconfig/20230523-083741-root.json
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1122.eqiad.wmnet
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:35 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1122.eqiad.wmnet
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48485 and previous config saved to /var/cache/conftool/dbconfig/20230523-082653-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48484 and previous config saved to /var/cache/conftool/dbconfig/20230523-082237-root.json
* 08:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] (duration: 08m 37s)
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48483 and previous config saved to /var/cache/conftool/dbconfig/20230523-081342-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48482 and previous config saved to /var/cache/conftool/dbconfig/20230523-081148-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48481 and previous config saved to /var/cache/conftool/dbconfig/20230523-080732-root.json
* 08:07 kartik@deploy1002: kartik: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:05 kartik@deploy1002: Started scap: Backport for [[gerrit:922464{{!}}Special:Contribute: Correct language code for Albanian (T327868)]]
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48480 and previous config saved to /var/cache/conftool/dbconfig/20230523-075227-root.json
* 07:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:51 hashar@deploy1002: Started deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:47 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] (duration: 07m 19s)
* 07:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]] (duration: 00m 07s)
* 07:44 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion {{!}} [[phab:T214068|T214068]]
* 07:41 marostegui@deploy1002: marostegui: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:39 marostegui@deploy1002: Started scap: Backport for [[gerrit:922389{{!}}Revert "db-production.php: Disable writes in es5"]]
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48479 and previous config saved to /var/cache/conftool/dbconfig/20230523-073841-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48478 and previous config saved to /var/cache/conftool/dbconfig/20230523-073722-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary [[phab:T337285|T337285]]', diff saved to https://phabricator.wikimedia.org/P48477 and previous config saved to /var/cache/conftool/dbconfig/20230523-073710-root.json
* 07:36 marostegui: Starting es5 eqiad failover from es1024 to es1023 [[phab:T337285|T337285]]
* 07:25 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] (duration: 07m 16s)
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48476 and previous config saved to /var/cache/conftool/dbconfig/20230523-072218-root.json
* 07:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:17 marostegui@deploy1002: Started scap: Backport for [[gerrit:922459{{!}}db-production.php: Disable writes in es5 (T337285)]]
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337285|T337285]]
* 07:14 kartik@deploy1002: Finished scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] (duration: 09m 42s)
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48475 and previous config saved to /var/cache/conftool/dbconfig/20230523-070713-root.json
* 07:06 kartik@deploy1002: kartik: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48474 and previous config saved to /var/cache/conftool/dbconfig/20230523-070547-root.json
* 07:04 kartik@deploy1002: Started scap: Backport for [[gerrit:921049{{!}}Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)]]
* 07:00 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] (duration: 06m 58s)
* 06:54 marostegui@deploy1002: marostegui: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 06:53 marostegui@deploy1002: Started scap: Backport for [[gerrit:922387{{!}}Revert "db-production: Disable es4 writes"]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48473 and previous config saved to /var/cache/conftool/dbconfig/20230523-065042-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change es1020 weight', diff saved to https://phabricator.wikimedia.org/P48472 and previous config saved to /var/cache/conftool/dbconfig/20230523-064850-root.json
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48471 and previous config saved to /var/cache/conftool/dbconfig/20230523-064820-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48470 and previous config saved to /var/cache/conftool/dbconfig/20230523-064729-root.json
* 06:46 marostegui: Starting es4 eqiad failover from es1021 to es1020 - [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 [[phab:T337283|T337283]]', diff saved to https://phabricator.wikimedia.org/P48469 and previous config saved to /var/cache/conftool/dbconfig/20230523-063836-root.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337283|T337283]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48468 and previous config saved to /var/cache/conftool/dbconfig/20230523-063538-root.json
* 06:26 marostegui@deploy1002: Finished scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] (duration: 08m 21s)
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48467 and previous config saved to /var/cache/conftool/dbconfig/20230523-062033-root.json
* 06:19 marostegui@deploy1002: marostegui: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 06:18 marostegui@deploy1002: Started scap: Backport for [[gerrit:922376{{!}}db-production: Disable es4 writes (T337283)]]
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48466 and previous config saved to /var/cache/conftool/dbconfig/20230523-060528-root.json
* 06:04 kart_: cxserver: Remove Flores MT service ([[phab:T331505|T331505]])
* 06:03 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 06:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 05:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48465 and previous config saved to /var/cache/conftool/dbconfig/20230523-055024-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48464 and previous config saved to /var/cache/conftool/dbconfig/20230523-053519-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48463 and previous config saved to /var/cache/conftool/dbconfig/20230523-052014-root.json
* 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.8 (duration: 02m 17s)
* 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]] (duration: 49m 04s)
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.10  refs [[phab:T330216|T330216]]
* 02:57 eileen: civicrm upgraded from {{Gerrit|3329155a}} to {{Gerrit|6642b602}}
* 02:22 eileen: civicrm upgraded from {{Gerrit|7eae24d5}} to {{Gerrit|3329155a}}


== 2020-02-20 ==
== 2023-05-22 ==
* 23:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245787|T245787]] [nlwiki] Add noindex for NS_USER and NS_USER_TALK (duration: 00m 56s)
* 23:29 eileen: civicrm upgraded from {{Gerrit|cc9593d0}} to {{Gerrit|7eae24d5}}
* 23:46 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgVectorPrintLogo for back-compat., not read since wmf.19 (duration: 00m 56s)
* 23:16 zabe@deploy1002: Finished scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] (duration: 06m 58s)
* 23:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw232[0-4].codfw.wmnet
* 23:11 zabe@deploy1002: zabe: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 23:45 mutante: gerrit1002 - test VM - rebooting for new disk
* 23:09 zabe@deploy1002: Started scap: Backport for [[gerrit:921614{{!}}Enable VE on new wikis]]
* 23:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw231[7-9].codfw.wmnet
* 21:38 sbassett: Deployed security mitigations for [[phab:T333140|T333140]] and [[phab:T336027|T336027]]
* 23:33 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw232[0-4].codfw.wmnet
* 20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1004.eqiad.wmnet
* 23:32 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw231[7-9].codfw.wmnet
* 20:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:32 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2381[7-9].codfw.wmnet
* 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 23:25 mutante: ganeti1003 - adding another virtual 20G disk to gerrit1002 ([[phab:T243808|T243808]])
* 20:53 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:51 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 23:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:45 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1004.eqiad.wmnet
* 23:04 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/includes/pager/IndexPager.php: IndexPager: Limit offset params to the max of the indices available (duration: 00m 56s)
* 20:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1005.eqiad.wmnet
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 22:28 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad
* 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
* 22:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 22:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:33 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1005.eqiad.wmnet
* 22:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8908dd1]: daemons: Install stack printing signal handler on SIGUSR1 (duration: 05m 05s)
* 20:27 TheresNoTime: close UTC late backport window
* 22:23 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8908dd1]: daemons: Install stack printing signal handler on SIGUSR1
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] (duration: 07m 47s)
* 21:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245780|T245780]] [mediawikiwiki] Deny the 'flow-hide' right to logged out and non-autoconfirmed users (duration: 00m 56s)
* 20:17 samtar@deploy1002: samtar and superpes: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:07 James_F: Train 1.35.0-wmf.20 provisionally looks OK on all wikis. Closing [[phab:T233868|T233868]].
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:921765{{!}}[kaawiki] Enable SandboxLink extension (T336648)]]
* 20:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.20
* 20:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] (duration: 08m 22s)
* 19:55 twentyafterfour: hotfix deployed
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 19:51 twentyafterfour: deploying phabricator hotfix:  https://phabricator.wikimedia.org/rPHEX2f36eee7ce67eb0c09e9bb0e79b42fc3b41d3597 for [[phab:T244165|T244165]]
* 20:09 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
* 19:33 bblack: codfw+ulsfo repooled in geodns
* 20:08 samtar@deploy1002: superpes and samtar: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 18:20 fdans@deploy1001: Finished deploy [analytics/refinery@e05ae16]: deploying refinery (duration: 11m 31s)
* 20:06 samtar@deploy1002: Started scap: Backport for [[gerrit:921764{{!}}[ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)]]
* 18:08 fdans@deploy1001: Started deploy [analytics/refinery@e05ae16]: deploying refinery
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 17:38 bblack: pushed codfw+ulsfo geodns depool
* 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:45 jynus: stop, upgrade and restart dbprov2002
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:26 jynus: stop, upgrade and restart dbprov1002
* 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:23 moritzm: installing Java security updates on Hadoop/Kafka Jumbo/AQS/Druid
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:16 jynus: stop, upgrade and restart db1140
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:12 moritzm: installing postgres security updates on netboxdb*
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:03 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@125cffa]: deploying aqs, third time is the charm (duration: 06m 15s)
* 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:57 fdans@deploy1001: Started deploy [analytics/aqs/deploy@125cffa]: deploying aqs, third time is the charm
* 17:04 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided) (duration: 00m 17s)
* 15:40 marostegui: Poweroff es2022 [[phab:T245714|T245714]]
* 17:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided)
* 15:32 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@95a7999]: deploying aqs (duration: 00m 48s)
* 16:58 XioNoX: push mgmt_junos to all L2 switches
* 15:32 fdans@deploy1001: Started deploy [analytics/aqs/deploy@95a7999]: deploying aqs
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2009.codfw.wmnet
* 15:23 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@cbc3241]: deploying aqs (duration: 04m 06s)
* 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
* 15:19 fdans@deploy1001: Started deploy [analytics/aqs/deploy@cbc3241]: deploying aqs
* 15:57 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
* 14:38 Urbanecm: [dry-run; mwmaint1002] foreachwiki extensions/AbuseFilter/maintenance/fixOldLogEntries.php --dry-run --verbose ([[phab:T228655|T228655]])
* 15:56 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2009.codfw.wmnet
* 12:53 moritzm: installing PHP updates on matomo1001/piwik
* 15:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
* 12:28 moritzm: installing PHP 7.0 security updates
* 15:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
* 12:11 Urbanecm: EU SWAT done
* 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
* 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|728d739}}: Configure logo for ngwikimedia ([[phab:T242416|T242416]]) (duration: 01m 04s)
* 15:25 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
* 12:05 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|64240e1}}: Add logos for ngwikimedia ([[phab:T242416|T242416]]) (duration: 01m 04s)
* 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 11:19 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
* 15:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - [[phab:T241049|T241049]]"
* 11:08 moritzm: installing boost update from Buster point release
* 14:32 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10468 and previous config saved to /var/cache/conftool/dbconfig/20200220-105117-marostegui.json
* 14:31 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 10:12 Reedy: created $wikidb.blobs_cluster27 on es1023 - [[phab:T245720|T245720]]
* 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 10:08 Reedy: created $wikidb.blobs_cluster26 on es1020 - [[phab:T245720|T245720]]
* 14:10 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 10:08 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 04s)
* 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor2003.codfw.wmnet with OS bookworm
* 09:42 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 03s)
* 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 09:27 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 01s)
* 12:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10467 and previous config saved to /var/cache/conftool/dbconfig/20200220-091233-marostegui.json
* 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor2003.codfw.wmnet with OS bookworm
* 09:02 akosiaris: restart etherpad-lite on etherpad1002 [[phab:T244238|T244238]]
* 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor1003.eqiad.wmnet with OS bookworm
* 09:00 marostegui: Restart m1 database master db1135 (etherpad will not be available for around 1 minute) - [[phab:T244238|T244238]]
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 08:40 jynus: disable puppet and stop bacula service [[phab:T244238|T244238]]
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P48456 and previous config saved to /var/cache/conftool/dbconfig/20230522-115936-root.json
* 08:35 marostegui: Upgrade mysql on db1135 without restart [[phab:T244238|T244238]]
* 11:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
* 07:47 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q15k (was Q10k) ([[phab:T225057|T225057]]) - in case of cache issues (duration: 01m 03s)
* 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor1003.eqiad.wmnet with OS bookworm
* 07:46 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q15k (was Q10k) ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path [[phab:T337220|T337220]]
* 07:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q10k (was Q8k) ([[phab:T225057|T225057]]) - in case of cache issue (duration: 01m 01s)
* 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path
* 07:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q10k (was Q8k) ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 10:06 hashar@deploy1002: Finished scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] (duration: 37m 00s)
* 07:17 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8000 ([[phab:T225057|T225057]]) - in case of cache issue (duration: 01m 03s)
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor2003.codfw.wmnet
* 07:15 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8000 ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 07:01 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6000 ([[phab:T225057|T225057]]) - extra sync for cache issue (duration: 01m 04s)
* 10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 07:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6000 ([[phab:T225057|T225057]]) (duration: 01m 06s)
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor2003.codfw.wmnet on all recursors
* 06:46 vgutierrez: test trafficserver 8.0.6-rc1 in cp30[64,65]
* 10:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor2003.codfw.wmnet on all recursors
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10466 and previous config saved to /var/cache/conftool/dbconfig/20200220-062445-marostegui.json
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:17 marostegui: Repool labsdb1011
* 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 06:12 marostegui: Remove partitions from db1101:3318 - [[phab:T239453|T239453]]
* 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 to remove revision partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10465 and previous config saved to /var/cache/conftool/dbconfig/20200220-061213-marostegui.json
* 10:02 moritzm: installing updated usb.ids packages for Bullseye
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 this host already had the partitions removed - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10464 and previous config saved to /var/cache/conftool/dbconfig/20200220-061019-marostegui.json
* 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 to remove revision partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10463 and previous config saved to /var/cache/conftool/dbconfig/20200220-060914-marostegui.json
* 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor2003.codfw.wmnet
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 on s8, db1099:3318 back to its original weight', diff saved to https://phabricator.wikimedia.org/P10462 and previous config saved to /var/cache/conftool/dbconfig/20200220-055943-marostegui.json
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor1003.eqiad.wmnet
* 00:22 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571860{{!}}Allow non-autoconfirmed users to propose OAuth apps (T213760)]] (duration: 01m 04s)
* 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 00:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573397{{!}}Enable password-reset (requireemail pref) on test WD and Commons (T245660)]] (duration: 01m 03s)
* 09:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor1003.eqiad.wmnet on all recursors
* 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor1003.eqiad.wmnet on all recursors
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
* 09:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor1003.eqiad.wmnet
* 09:39 hashar@deploy1002: hashar: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:29 hashar@deploy1002: Started scap: Backport for [[gerrit:921558{{!}}Revert "[WikibaseMediaInfo] Add 'main subject of' property"]]
* 08:46 marostegui: Stop mysql on db2160 (haproxy irc alerts will be generated)
* 08:28 elukey: drain Arelion link between cr1-codfw and cr3-eqsin to mitigate packet loss eqiad <-> eqsin
* 08:22 moritzm: installing systemd security updates
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48455 and previous config saved to /var/cache/conftool/dbconfig/20230522-081724-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48454 and previous config saved to /var/cache/conftool/dbconfig/20230522-080219-root.json
* 07:59 elukey: restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48453 and previous config saved to /var/cache/conftool/dbconfig/20230522-075613-root.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48452 and previous config saved to /var/cache/conftool/dbconfig/20230522-074715-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48451 and previous config saved to /var/cache/conftool/dbconfig/20230522-074109-root.json
* 07:37 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48450 and previous config saved to /var/cache/conftool/dbconfig/20230522-073210-root.json
* 07:28 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48449 and previous config saved to /var/cache/conftool/dbconfig/20230522-072604-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48448 and previous config saved to /var/cache/conftool/dbconfig/20230522-071705-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48447 and previous config saved to /var/cache/conftool/dbconfig/20230522-071333-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48446 and previous config saved to /var/cache/conftool/dbconfig/20230522-071326-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48445 and previous config saved to /var/cache/conftool/dbconfig/20230522-071319-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48444 and previous config saved to /var/cache/conftool/dbconfig/20230522-071059-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48443 and previous config saved to /var/cache/conftool/dbconfig/20230522-070200-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48442 and previous config saved to /var/cache/conftool/dbconfig/20230522-065828-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48441 and previous config saved to /var/cache/conftool/dbconfig/20230522-065822-root.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48440 and previous config saved to /var/cache/conftool/dbconfig/20230522-065815-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48439 and previous config saved to /var/cache/conftool/dbconfig/20230522-065555-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48438 and previous config saved to /var/cache/conftool/dbconfig/20230522-064656-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 [[phab:T337206|T337206]]', diff saved to https://phabricator.wikimedia.org/P48437 and previous config saved to /var/cache/conftool/dbconfig/20230522-064541-root.json
* 06:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
* 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48436 and previous config saved to /var/cache/conftool/dbconfig/20230522-064323-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48435 and previous config saved to /var/cache/conftool/dbconfig/20230522-064317-root.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48434 and previous config saved to /var/cache/conftool/dbconfig/20230522-064310-root.json
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1121.eqiad.wmnet
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48433 and previous config saved to /var/cache/conftool/dbconfig/20230522-064050-root.json
* 06:40 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 06:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
* 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1121.eqiad.wmnet
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48432 and previous config saved to /var/cache/conftool/dbconfig/20230522-063151-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48431 and previous config saved to /var/cache/conftool/dbconfig/20230522-062818-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48430 and previous config saved to /var/cache/conftool/dbconfig/20230522-062812-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48429 and previous config saved to /var/cache/conftool/dbconfig/20230522-062805-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48428 and previous config saved to /var/cache/conftool/dbconfig/20230522-062545-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight to es2024', diff saved to https://phabricator.wikimedia.org/P48427 and previous config saved to /var/cache/conftool/dbconfig/20230522-061947-marostegui.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48426 and previous config saved to /var/cache/conftool/dbconfig/20230522-061925-root.json
* 06:17 marostegui: Starting es5 codfw failover from es2023 to es2024 - [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 [[phab:T337204|T337204]]', diff saved to https://phabricator.wikimedia.org/P48425 and previous config saved to /var/cache/conftool/dbconfig/20230522-061524-root.json
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T337204|T337204]]
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48424 and previous config saved to /var/cache/conftool/dbconfig/20230522-061314-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48423 and previous config saved to /var/cache/conftool/dbconfig/20230522-061307-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48422 and previous config saved to /var/cache/conftool/dbconfig/20230522-061300-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48421 and previous config saved to /var/cache/conftool/dbconfig/20230522-061040-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2021', diff saved to https://phabricator.wikimedia.org/P48420 and previous config saved to /var/cache/conftool/dbconfig/20230522-061033-marostegui.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48419 and previous config saved to /var/cache/conftool/dbconfig/20230522-055809-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48418 and previous config saved to /var/cache/conftool/dbconfig/20230522-055803-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48417 and previous config saved to /var/cache/conftool/dbconfig/20230522-055756-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48416 and previous config saved to /var/cache/conftool/dbconfig/20230522-055120-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48415 and previous config saved to /var/cache/conftool/dbconfig/20230522-054304-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48414 and previous config saved to /var/cache/conftool/dbconfig/20230522-054258-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48413 and previous config saved to /var/cache/conftool/dbconfig/20230522-054251-root.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48412 and previous config saved to /var/cache/conftool/dbconfig/20230522-053705-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 codfw primaryT337203', diff saved to https://phabricator.wikimedia.org/P48411 and previous config saved to /var/cache/conftool/dbconfig/20230522-053554-marostegui.json
* 05:34 marostegui: Starting es4 codfw failover from es2021 to es2020 - [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 [[phab:T337203|T337203]]', diff saved to https://phabricator.wikimedia.org/P48410 and previous config saved to /var/cache/conftool/dbconfig/20230522-052938-root.json
* 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 [[phab:T337203|T337203]]
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48409 and previous config saved to /var/cache/conftool/dbconfig/20230522-052800-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48408 and previous config saved to /var/cache/conftool/dbconfig/20230522-052753-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48407 and previous config saved to /var/cache/conftool/dbconfig/20230522-052746-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029, es1030, es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48406 and previous config saved to /var/cache/conftool/dbconfig/20230522-051957-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Failover es1, es2 and es3 masters for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48405 and previous config saved to /var/cache/conftool/dbconfig/20230522-051723-marostegui.json


== 2020-02-19 ==
== 2023-05-21 ==
* 23:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw138[0-3].eqiad.wmnet
* 07:45 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
* 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw137[4-9].eqiad.wmnet
* 07:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
* 23:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1363.eqiad.wmnet
* 07:43 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
* 23:28 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: cirrus: Reduce CirrusSearch-MoreLike cache workers and queue back to normal (duration: 01m 03s)
* 07:42 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
* 23:26 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw138[0-3].eqiad.wmnet
* 07:41 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 23:26 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw137[4-9].eqiad.wmnet
* 07:40 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 23:25 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1363.eqiad.wmnet
* 23:23 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: redirect more_like from codfw back to eqiad (duration: 01m 04s)
* 23:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:57 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c16c63a]: articletopic thresholding for ores scores and eventgate port update (duration: 00m 57s)
* 22:56 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c16c63a]: articletopic thresholding for ores scores and eventgate port update
* 22:54 robh: cp3050 & cp3051 returned to service via [[phab:T243167|T243167]]
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgServer to protocol-relative for Wikitech and Test Wikitech (duration: 01m 05s)
* 22:37 robh: taking cp3050 & cp3051 offline for firmware update via [[phab:T243167|T243167]]
* 22:23 mutante: phabricator - upgrading PHP packages
* 22:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw231([0-6]).codfw.wmnet
* 22:12 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw231([0-6]).codfw.wmnet
* 22:11 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(6[4-9]{{!}}7[0-3]{{!}}84).eqiad.wmnet
* 22:10 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw13(6[4-9]{{!}}7[0-3]{{!}}84).eqiad.wmnet
* 22:08 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2314.codfw.wmnet
* 21:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:54 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:52 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 21:48 bblack: all authdns servers - upgrade to gdnsd-3.2.2
* 21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:29 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:55 eileen: civicrm revision changed from {{Gerrit|52c68911c6}} to {{Gerrit|a6b222c19f}}, config revision is {{Gerrit|561ae21f77}}
* 20:15 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 06s)
* 20:13 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 20:12 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 06s)
* 20:10 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 05s)
* 20:05 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.20 (duration: 01m 03s)
* 20:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.20
* 20:02 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 20:02 rzl@cumin1001: conftool action : set/weight=10; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 19:54 rlazarus: scap pull on new api servers mw13[56-62]
* 19:50 mutante: generating mcrouter certs for new codfw mw appservers
* 19:39 mutante: initial puppet run on new hosts mw231*
* 19:31 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/skins/MinervaNeue/includes/MinervaHooks.php: [[phab:T245162|T245162]] Check title value before proceeding to check if user page (duration: 01m 04s)
* 19:27 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/skins/MinervaNeue/includes/MinervaHooks.php: [[phab:T245162|T245162]] Check title value before proceeding to check if user page (duration: 01m 04s)
* 19:21 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: [[phab:T244577|T244577]] [metawiki] Disable MobileFrontend mainpage special casing (duration: 01m 04s)
* 19:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244369|T244369]] [trwiki] Enable the WikidataPageBanner extension (duration: 01m 05s)
* 19:11 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: [[phab:T245570|T245570]] resourceloader: fix SqlDependencyModuleStore::setMulti() to use upsert() (duration: 01m 01s)
* 18:45 bblack: dns4001 - upgraded to gdnsd-3.2.2
* 18:44 bblack: reprepro: upload gdnsd 3.2.2-1~wmf1 to buster-wikimedia
* 18:39 mutante: mwmaint1002 - sudo systemctl reset-failed to clear systemd alerts
* 18:38 mutante: mwmaint1002 - removing Icinga ACK for systemd state - comments for it were from HHVM removal in Oct 2019
* 18:26 mutante: phab2001 - upgraded ssh-server, kept locally modified config; apt autoremove removes python3-debconf
* 18:23 mutante: phab2001 - installing package upgrades, incl. openssh, PHP version
* 18:22 mutante: phab2001 - upgrading mariadb client package versions
* 18:19 mutante: removing problem ACK from Icinga alerts for wikitech-static MediaWiki version. comments were about things in 2019
* 17:48 robh: cp1089 cp1090 returned to service via [[phab:T243167|T243167]]
* 17:40 jynus: starting data check between db1078 and db1140:3313 [[phab:T244958|T244958]]
* 17:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q4000 ([[phab:T225057|T225057]]) (just incase of cache issue) (duration: 01m 04s)
* 17:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q4000 ([[phab:T225057|T225057]]) (duration: 01m 01s)
* 17:14 ema: cp4026: repool after probe Connection:keep-alive experiment revert https://gerrit.wikimedia.org/r/573337
* 17:12 robh: cp1088 returned to service, cp1089 & cp1090 offline for firmware update via [[phab:T243167|T243167]]
* 16:44 papaul: replacing ps1-a8-codfw mgmt in rack A8 will go down
* 16:37 otto@deploy1001: Finished deploy [analytics/refinery@e23918a]: Updating eventgate-analytics port ([[phab:T245203|T245203]]) and also eventlogging whitelist (duration: 12m 27s)
* 16:32 ema: depool cp4026, 5xx
* 16:24 otto@deploy1001: Started deploy [analytics/refinery@e23918a]: Updating eventgate-analytics port ([[phab:T245203|T245203]]) and also eventlogging whitelist
* 16:13 marostegui: Depool labsdb1011 to help replication to catch up
* 16:05 elukey: Update analytics-in4 filter term eventgate for [[phab:T245203|T245203]] on cr1/cr2 eqiad
* 15:48 ariel@deploy1001: Finished deploy [dumps/dumps@b42acb5]: fix temp stub generation, add pagerangeinfo cache, some unit tests (duration: 00m 03s)
* 15:48 ariel@deploy1001: Started deploy [dumps/dumps@b42acb5]: fix temp stub generation, add pagerangeinfo cache, some unit tests
* 14:59 marostegui: Stop mysql on es2021 - [[phab:T243052|T243052]]
* 14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 marostegui: Data checksum on db1084 [[phab:T245621|T245621]]
* 14:07 marostegui: Upgrade and reboot db1084 - [[phab:T245621|T245621]]
* 14:02 marostegui: Start mysql on db1084 without replication - [[phab:T245621|T245621]]
* 13:53 jbond42: disable puppet to upgrade postgresql
* 13:30 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1084, lots of connection errors', diff saved to https://phabricator.wikimedia.org/P10458 and previous config saved to /var/cache/conftool/dbconfig/20200219-133057-jynus.json
* 12:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573236{{!}}Start reading for the new term store for clients up to Q2000 (T225057)]], take II, the cache issue (duration: 01m 04s)
* 12:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573236{{!}}Start reading for the new term store for clients up to Q2000 (T225057)]] (duration: 01m 06s)
* 11:56 volans: better splay of periodic scripts that interact with Netbox - [[phab:T244291|T244291]]
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:08 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib/includes/Store: Get rid of useless metrics in EntityTermLookupBase ([[phab:T245592|T245592]]) (duration: 01m 04s)
* 11:06 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib/includes/Store: Get rid of useless metrics in EntityTermLookupBase ([[phab:T245592|T245592]]) (duration: 01m 12s)
* 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:58 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:58 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:45 jynus: upgrading mariadb client on cumin hosts
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3315, db2089:3316 after new package testing', diff saved to https://phabricator.wikimedia.org/P10457 and previous config saved to /var/cache/conftool/dbconfig/20200219-103806-marostegui.json
* 10:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:17 jynus: stopping db2089 mariadb@s5
* 10:12 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw135[0-5]*.eqiad.wmnet
* 10:12 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw135[0-5]*.eqiad.wmnet
* 10:11 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1349.eqiad.wmnet
* 10:11 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw1349.eqiad.wmnet
* 10:09 moritzm: updated tftpboot environment for stretch-bootif for the 9.12 point release [[phab:T241359|T241359]]
* 09:53 jynus: stopping and upgrading db1140 instances
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3315, db2089:3316 for new package testing', diff saved to https://phabricator.wikimedia.org/P10455 and previous config saved to /var/cache/conftool/dbconfig/20200219-095139-marostegui.json
* 09:51 marostegui: Depool db2089:3315, db2089:3316 for new package testing
* 09:49 akosiaris: [[phab:T245516|T245516]]. Deploy mathoid chart version 0.0.27, removing logstash gelf configuration
* 09:46 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 09:43 vgutierrez: test trafficserver 8.0.6-rc1 in cp40[26,32]
* 09:34 _joe_: cleared opcache on mw1313
* 09:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 09:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:33 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 08:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 08:50 marostegui: Remove dbproxy1007 grants from m2 - [[phab:T231280|T231280]]
* 08:41 marostegui: Remove wikiadmin2 user from s7 - [[phab:T243512|T243512]]
* 08:23 Urbanecm: run mwscript deleteEqualMessages.php cswiki --delete
* 08:14 godog: roll restart swift proxies - [[phab:T244776|T244776]]
* 07:02 marostegui: Remove wikiadmin2 user from es2 - [[phab:T243512|T243512]]
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 50 -> 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10454 and previous config saved to /var/cache/conftool/dbconfig/20200219-065726-marostegui.json
* 06:35 marostegui: Compress watchlist_expiry table on s3 (this will take hours as I have left a 60 seconds sleep between tables) - [[phab:T245358|T245358]]
* 06:17 marostegui: Compress new and empty watchlist_expiry table - [[phab:T245358|T245358]]
* 01:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1353.eqiad.wmnet
* 01:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1354.eqiad.wmnet
* 01:22 mutante: mw1353 - restarted apache (some race condition on new installs, 5 other servers did not have the issue)
* 01:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
* 01:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1350.eqiad.wmnet
* 01:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1351.eqiad.wmnet
* 01:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1352.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1355.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1354.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1350.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1353.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1351.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1352.eqiad.wmnet
* 01:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T240728|T240728]] Fix Latin Wikipedia (VICIPÆDIA) wordmark and set size correctly (duration: 01m 06s)
* 01:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:43 James_F: Manually purged https://en.wikipedia.org/images/mobile/copyright/wikipedia-wordmark-la.svg and .png from Varnish for [[phab:T240728|T240728]]
* 00:41 jforrester@deploy1001: Synchronized static/images/mobile/copyright/: [[phab:T240728|T240728]] Sync logo images (duration: 01m 04s)
* 00:40 mutante: mw1351 through mw1355 - initial puppet runs - new appservers
* 00:36 niharika29@deploy1001: Synchronized static/images/mobile/copyright/: Remove unnecessary id from wordmark (duration: 01m 03s)
* 00:34 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adjust MT Threshold for Assamese to 70% - [[phab:T245509|T245509]] (duration: 01m 04s)
* 00:24 niharika29@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/WikimediaEvents/: Follow up on authevents statsd changes in {{Gerrit|I7612b68fe}} (duration: 01m 03s)
* 00:21 niharika29@deploy1001: Synchronized wmf-config/logging.php: Update authmanager-statsd channel name (duration: 01m 03s)
* 00:16 eileen: civicrm revision changed from {{Gerrit|8c77e9e915}} to {{Gerrit|52c68911c6}}, config revision is {{Gerrit|561ae21f77}}
* 00:10 niharika29@deploy1001: Synchronized wmf-config/logging.php: Make the logstash and authmanager-statsd Monolog handlers compatible (duration: 01m 04s)
* 00:08 mutante: creating mcrouter certs for mw1350


== 2020-02-18 ==
== 2023-05-20 ==
* 23:56 mutante: mw1349 - scap pull
* 18:25 effie: restart varnish cp3061
* 23:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1349.eqiad.wmnet
* 16:39 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1018.eqiad.wmnet
* 23:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1349.eqiad.wmnet
* 15:17 hoo@deploy1002: Finished scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] (duration: 08m 47s)
* 23:34 maryum: running reindex on mwmaint1002 - [[phab:T194448|T194448]]
* 15:10 hoo@deploy1002: hoo: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 23:28 maryum: running reindex for wikimedia wikis
* 15:08 hoo@deploy1002: Started scap: Backport for [[gerrit:921549{{!}}Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)]]
* 23:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:41 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1018.eqiad.wmnet
* 23:12 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2151.wmnet
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:12 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2150.wmnet
* 09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 23:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
* 22:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Enable ores_articletopics field creation for all wikis (extra sync for [[phab:T236104|T236104]]) (duration: 01m 04s)
* 09:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 22:54 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Enable ores_articletopics field creation for all wikis (duration: 01m 03s)
* 22:52 chaomodus: completed upgrading Netbox to 2.7.4 [[phab:T244291|T244291]]
* 22:51 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part3) (duration: 00m 11s)
* 22:51 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part3)
* 22:49 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part2) (duration: 01m 19s)
* 22:48 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part2)
* 22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (duration: 01m 19s)
* 22:45 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]]
* 22:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244185|T244185]] Raise minimum log level for 'OAuth' from DEBUG to INFO (duration: 01m 04s)
* 22:30 chaomodus: Upgrading Netbox to 2.7.4
* 21:56 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:54 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 21:26 XioNoX: rollback tcp-mss clamping in eqiad/eqord
* 21:07 jeh: power down and set incinga downtime on cloudvirt1022 [[phab:T243536|T243536]]
* 21:07 jeh: power down and set incinga downtime on cloudvirt1022 [[phab:T241884|T241884]]
* 20:54 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventStreamConfig extension on metawiki - [[phab:T242122|T242122]] (duration: 01m 03s)
* 20:47 ppchelko@deploy1001: Finished deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group [[phab:T244387|T244387]] (duration: 07m 59s)
* 20:45 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventStreamConfig extension on testwiki - [[phab:T242122|T242122]] (duration: 01m 04s)
* 20:39 ppchelko@deploy1001: Started deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group [[phab:T244387|T244387]]
* 20:06 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/libs/StatusValue.php: [[phab:T245155|T245155]] StatusValue: Fix __toString() to not choke on special parameters (duration: 01m 04s)
* 20:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.20 [[phab:T233868|T233868]]
* 19:52 jforrester@deploy1001: Finished scap: testwiki to 1.35.0-wmf.20 and re-build l10n cache [[phab:T233868|T233868]] (duration: 61m 01s)
* 19:41 papaul: shutting down dns2001 for 10G card troubleshooting
* 19:30 James_F: Running `foreachwiki sql.php php-1.35.0-wmf.19/maintenance/archives/patch-watchlist_expiry.sql` for [[phab:T244631|T244631]]
* 18:51 jforrester@deploy1001: Started scap: testwiki to 1.35.0-wmf.20 and re-build l10n cache [[phab:T233868|T233868]]
* 18:49 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.18 (duration: 15m 29s)
* 18:25 James_F: Running `scap prep` for 1.35.0-wmf.20 ref. [[phab:T233868|T233868]]
* 18:01 James_F: 1.35.0-wmf.20 was branched at {{Gerrit|c664b4f1b933d110bd69f074c399695bd6b17d13}} for [[phab:T233868|T233868]]
* 18:01 marxarelli: completed promotion of 1.35.0-wmf.19 to all wikis ([[phab:T233867|T233867]])
* 17:52 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Re-roll all wikis to 1.35.0-wmf.19 ([[phab:T233867|T233867]])
* 17:47 marxarelli: re-rolling wmf.19 to all wikis ([[phab:T233867|T233867]]) with eyes particularly on ([[phab:T245202|T245202]])
* 17:28 bblack: cp3 (esams edge) - revert GRE MTU mitigations - [[phab:T232602|T232602]]
* 17:00 papaul: restting ps1-a8-codfw see [[phab:T245164|T245164]]
* 16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:12 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:11 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 16:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:08 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 16:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:02 ottomata: deploying new 'canary' and 'production' releases for eventgate-main.  (These releases use a new nodePort, and so will not be active until LVS is modified.  The old 'main' release and nodePort is left as is.) - [[phab:T242861|T242861]]
* 16:02 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 15:51 bblack: dns2001 - shutdown for hw/reimage work - [[phab:T242017|T242017]]
* 15:47 bblack: dns2001 - stopping bgp to drain service for hw/reimage work - [[phab:T242017|T242017]]
* 15:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 15:40 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:36 jynus: stopping db1140:s3 instance
* 15:35 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 15:34 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:34 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:14 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:08 vgutierrez@puppetmaster1001: conftool action : set/weight=100; selector: dc=eqiad,cluster=cache_text,service=ats-be,name=cp1089.eqiad.wmnet
* 15:04 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:56 bblack: esams repooled in DNS
* 14:54 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:54 ottomata: deploying new 'canary' and 'production' releases for eventgate-analytics.  (These releases use a new nodePort, and so will not be active until LVS is modified.  The old 'analytics' release and nodePort is left as is.) - [[phab:T242861|T242861]]
* 14:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:39 XioNoX: remove cr2-esams VRRP handicap - [[phab:T243080|T243080]]
* 14:34 XioNoX: restore default esams-eqiad link cost - [[phab:T243080|T243080]]
* 14:33 XioNoX: re-enable cr2-esams BGP transit/peering - [[phab:T243080|T243080]]
* 14:31 XioNoX: cr2-esams - request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 14:29 XioNoX: re-disable cr2-esams BGP group IX4 - [[phab:T243080|T243080]]
* 14:14 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/DiscussionTools: [[gerrit:572882{{!}}wmf.18: Add config option and query parameter to control loading]] (duration: 01m 11s)
* 14:02 cdanis: depool esams
* 14:01 XioNoX: re-enable cr2-esams BGP group IX4 - [[phab:T243080|T243080]]
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 25 -> 50 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10448 and previous config saved to /var/cache/conftool/dbconfig/20200218-135525-marostegui.json
* 13:44 XioNoX: installing OS on cr2-esams:re0 - [[phab:T243080|T243080]]
* 13:39 XioNoX: cr2-esams - request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 13:37 XioNoX: deactivate peering/transit on cr2-esams - [[phab:T243080|T243080]]
* 13:24 XioNoX: reboot cr2-esams:re1 (backup) - [[phab:T243080|T243080]]
* 13:23 XioNoX: bump cost of eqiad-esams transport - [[phab:T243080|T243080]]
* 13:10 XioNoX: fail vrrp master to cr3-esams - [[phab:T243080|T243080]]
* 12:58 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 12:55 Amir1: EU SWAT done
* 12:53 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572731{{!}}Add DiscussionTools to four wikis in hidden mode (T244870)]], take II (duration: 01m 03s)
* 12:52 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572731{{!}}Add DiscussionTools to four wikis in hidden mode (T244870)]] (duration: 01m 04s)
* 12:45 XioNoX: remove graceful-switchover and nonstop-routing from cr2-esams - [[phab:T243080|T243080]]
* 12:36 XioNoX: push new Junos to cr2-esams:re1 (backup RE, noop) - [[phab:T243080|T243080]]
* 12:22 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part II (duration: 01m 03s)
* 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part I, take II (the cache issue) (duration: 01m 04s)
* 12:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part I (duration: 01m 06s)
* 12:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572628{{!}}Start reading for the new term store for clients up to Q1000 (T225057)]] (duration: 01m 05s)
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4b193dd}}: Increase Commons linkpurge rate limit for patrollers ([[phab:T245214|T245214]]) (duration: 01m 31s)
* 11:51 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:48 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:47 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:43 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:41 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:35 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:27 jynus: reenabling prometheus exporter metadata user for prometheus1003
* 11:10 jynus: temp. disabling prometheus exporter metadata user for prometheus1003
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 15 -> 25 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10445 and previous config saved to /var/cache/conftool/dbconfig/20200218-104958-marostegui.json
* 09:27 gehel: re-enable puppet on mw* - [[phab:T222321|T222321]]
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10444 and previous config saved to /var/cache/conftool/dbconfig/20200218-091343-marostegui.json
* 09:09 gehel: disabling puppet on mw* to deploy apache config change - [[phab:T222321|T222321]]
* 09:07 volans: rm /var/log/exim4/paniclog on cumin1001 to clear OOM from last week error
* 08:59 marostegui: Remove wikiadmin2 grants from es1 [[phab:T243512|T243512]]
* 08:59 marostegui: Remove wikiadmin2 grants from es1
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options', diff saved to https://phabricator.wikimedia.org/P10443 and previous config saved to /var/cache/conftool/dbconfig/20200218-085713-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10442 and previous config saved to /var/cache/conftool/dbconfig/20200218-082306-marostegui.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10441 and previous config saved to /var/cache/conftool/dbconfig/20200218-080952-marostegui.json
* 08:08 marostegui: Restart MySQL to pick up optimizer_switch changes - [[phab:T245489|T245489]]
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10440 and previous config saved to /var/cache/conftool/dbconfig/20200218-080623-marostegui.json
* 07:34 elukey: powercycle analytics1065 (crashed hours ago, no mgmt console available, no ssh)
* 06:39 marostegui: Remove wikiadmin2 from pc1007, pc1008, pc1009 and pc1010 [[phab:T243512|T243512]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1107 100 -> 200 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10439 and previous config saved to /var/cache/conftool/dbconfig/20200218-063819-marostegui.json
* 06:27 marostegui: Stop haproxy on dbproxy1007 - [[phab:T245385|T245385]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 100 and weight 10 in API for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10438 and previous config saved to /var/cache/conftool/dbconfig/20200218-062459-marostegui.json
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:08 marostegui@cumin1001: START - Cookbook sre.hosts.decommission


== 2020-02-17 ==
== 2023-05-19 ==
* 19:56 cdanis: finish enabling TCP-MSS clamping in eqiad
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:49 cdanis: s/no-op//
* 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 19:49 cdanis: no-op enable TCP-MSS clamping on eqord and eqiad
* 21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
* 19:33 cdanis: no-op enable flowspec change on cr2-eqord and cr2-eqiad
* 21:19 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 18:25 elukey: restart kafka on kafka-jumbo1001 to pick up new openjdk updates
* 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1495.eqiad.wmnet
* 17:25 bblack: GRE MTU mitigations applied to esams cp hosts only - [[phab:T232602|T232602]]
* 19:46 mutante: mw1469 - sudo pkill ffmpeg (per runbook)
* 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1469.eqiad.wmnet
* 15:50 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:45 mutante: depooled mw1469 from videoscaler, dedicating to just jobrunner
* 15:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1469.eqiad.wmnet
* 15:48 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:36 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) (duration: 00m 09s)
* 15:44 cdanis: ✔️ cdanis@icinga1001.wikimedia.org ~ 🕥☕ sudo systemctl restart ircecho
* 19:36 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b34c529]: (no justification provided)
* 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10422 and previous config saved to /var/cache/conftool/dbconfig/20200217-143146-marostegui.json
* 16:55 mutante: mw2448 - scap pull - [[phab:T2334429|T2334429]]
* 14:17 ema: reprepro includedeb buster-wikimedia ~ema/cadvisor_0.35.0+ds1-4_amd64.deb [[phab:T183146|T183146]]
* 15:31 taavi@deploy1002: Finished scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] (duration: 22m 02s)
* 12:34 XioNoX: add test flowspec rules to cr3-knams
* 15:21 taavi@deploy1002: legoktm and taavi: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 12:34 moritzm: installing postgresql-9.4 security updates
* 15:09 taavi@deploy1002: Started scap: Backport for [[gerrit:921150{{!}}i18n: Add link to help page (T322717)]], [[gerrit:921326{{!}}Enable RealMe (T324535)]]
* 12:27 vgutierrez: reboot acmechief instances (kernel upgrade)
* 15:06 legoktm@deploy1002: Finished scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] (duration: 09m 46s)
* 10:31 jynus: dropping all databases from db1140:3313
* 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): ' db1107 increase API weight from 10 to 15 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10420 and previous config saved to /var/cache/conftool/dbconfig/20200217-102218-marostegui.json
* 14:59 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
* 10:20 vgutierrez: rolling restart of ats-tls and varnish-fe on ulsfo to enable KA between them - [[phab:T244464|T244464]]
* 14:58 legoktm@deploy1002: legoktm: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 10:00 moritzm: installing Linux 4.9.210 kernels on stretch systems
* 14:57 legoktm@deploy1002: Started scap: Backport for [[gerrit:921252{{!}}Disable GWToolset from Commons (T270911)]]
* 09:10 godog: correction, +100G
* 14:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 09:09 godog: +10G to prometheus/ops fs on prometheus eqiad - [[phab:T245361|T245361]]
* 14:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 09:06 godog: +50G to prometheus/ops fs on prometheus eqiad - [[phab:T245361|T245361]]
* 14:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
* 07:22 marostegui: Stop haproxy on dbproxy1002 - [[phab:T245384|T245384]]
* 14:35 sukhe: enable puppet on A:lvs, finished rolling out change
* 14:20 sukhe: disable puppet on A:lvs to roll out CR 910566
* 14:17 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
* 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
* 13:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 10s)
* 13:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
* 13:34 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
* 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
* 13:26 topranks: Adding vlan config for row e/f vlans on ssw1-f1-eqiad ([[phab:T322937|T322937]])
* 13:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9  refs [[phab:T330215|T330215]]
* 12:19 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
* 11:27 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
* 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye
* 10:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
* 10:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 10:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
* 10:45 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
* 10:44 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
* 10:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
* 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye
* 10:07 moritzm: installing ncurses security updates
* 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
* 09:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
* 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2040-2043].codfw.wmnet
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
* 09:21 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
* 09:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
* 09:15 mvernon@cumin2002: START - Cookbook sre.dns.netbox
* 09:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
* 09:02 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
* 08:59 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2040-2043].codfw.wmnet
* 08:58 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
* 08:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
* 08:45 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
* 08:41 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
* 08:38 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
* 08:38 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 08:34 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
* 08:31 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
* 08:27 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
* 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
* 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host netflow2003.codfw.wmnet with OS bookworm
* 08:11 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
* 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
* 08:09 moritzm: copy samplicator from bullseye-wikimedia to bookworm-wikimedia [[phab:T330884|T330884]]
* 08:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
* 07:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
* 07:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48397 and previous config saved to /var/cache/conftool/dbconfig/20230519-074256-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48396 and previous config saved to /var/cache/conftool/dbconfig/20230519-074044-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48395 and previous config saved to /var/cache/conftool/dbconfig/20230519-073959-root.json
* 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
* 07:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48394 and previous config saved to /var/cache/conftool/dbconfig/20230519-072751-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48393 and previous config saved to /var/cache/conftool/dbconfig/20230519-072539-root.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48392 and previous config saved to /var/cache/conftool/dbconfig/20230519-072454-root.json
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus4001.ulsfo.wmnet
* 07:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus4001.ulsfo.wmnet
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48391 and previous config saved to /var/cache/conftool/dbconfig/20230519-071247-root.json
* 07:11 moritzm: installing emacs security updates
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48390 and previous config saved to /var/cache/conftool/dbconfig/20230519-071034-root.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48389 and previous config saved to /var/cache/conftool/dbconfig/20230519-070949-root.json
* 06:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48388 and previous config saved to /var/cache/conftool/dbconfig/20230519-065742-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48387 and previous config saved to /var/cache/conftool/dbconfig/20230519-065530-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48386 and previous config saved to /var/cache/conftool/dbconfig/20230519-065445-root.json
* 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48385 and previous config saved to /var/cache/conftool/dbconfig/20230519-064237-root.json
* 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48384 and previous config saved to /var/cache/conftool/dbconfig/20230519-064025-root.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48383 and previous config saved to /var/cache/conftool/dbconfig/20230519-063940-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48382 and previous config saved to /var/cache/conftool/dbconfig/20230519-062733-root.json