You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(vgutierrez: restart ats-tls on cp5008.eqsin.wmnet - T249335)
imported>Stashbot
(hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 (duration: 00m 08s))
 
(970 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-04-12 ==
== 2023-03-25 ==
* 11:11 vgutierrez: restart ats-tls on cp5008.eqsin.wmnet - [[phab:T249335|T249335]]
* 07:54 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 (duration: 00m 08s)
* 10:18 elukey: restart wdqs-updater on wdqs1004 (logs show no reports from the past hours, last one were stack traces related to a json decode failure)
* 07:54 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0
* 06:59 dcausse: restarting blazegraph on wdqs1004 ([[phab:T242453|T242453]])
* 00:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
* 06:35 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1025.eqiad.wmnet
* 00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
* 06:32 elukey: powerdown restbase1025 - [[phab:T250027|T250027]]
* 00:57 mutante: doc1002 - issue is mismatched UIDs again, most likely. doc-uploader is debmonitor on new host
* 06:21 elukey: powercycle restbase1025 (not reachable, serial console shows blank, racadm getsel reports errors with DIMM_B2)
* 00:56 mutante: doc1002 - manually running rsync to doc2002 - which failed with status 23 when started by timer
* 05:53 bblack: pushing https://gerrit.wikimedia.org/r/588134 to cache_text
* 00:09 tzatziki: removing 2 files for legal compliance
* 05:50 vgutierrez: restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]


== 2020-04-11 ==
== 2023-03-24 ==
* 19:52 cdanis@cumin1001: dbctl commit (dc=all): 'slight deweight to db1111', diff saved to https://phabricator.wikimedia.org/P10960 and previous config saved to /var/cache/conftool/dbconfig/20200411-195235-cdanis.json
* 23:58 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 17:35 cdanis@cumin1001: dbctl commit (dc=all): 's8: +weight db1111, -weight db1126', diff saved to https://phabricator.wikimedia.org/P10959 and previous config saved to /var/cache/conftool/dbconfig/20200411-173517-cdanis.json
* 23:57 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 15:39 vgutierrez: restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
* 23:50 tzatziki: removing 1 file for legal compliance
* 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 21:08 mutante: mwmaint1002 ferm rules for rsyncd_access from miscweb removed by puppet after {{Gerrit|I4fe17f397856361}} which reverted a8af0339bde14018e8. manually deleted rsyncd config and stopped rsync service. complete noop on mwmaint2002 which is currently the active mwmaint server. [[phab:T328907|T328907]]
* 09:20 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 18:50 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable (duration: 00m 13s)
* 07:01 vgutierrez: restart ats-tls on cp[1079,1081,1083,1085].eqiad.wmnet- [[phab:T249335|T249335]]
* 18:50 ebernhardson@deploy2002: Started deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable
* 18:30 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags (duration: 00m 16s)
* 18:30 ebernhardson@deploy2002: Started deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags
* 18:00 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 20s)
* 18:00 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag
* 17:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 06s)
* 17:55 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag
* 15:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 15:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 15:35 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 15:09 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:59 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:24 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki wikimaniawiki "2024:Expressions of Interest" "Wikimania:Expressions of Interest" "Zabe" --reason "per request [[:phab:T332917{{!}}T332917]]" # [[phab:T332917|T332917]]
* 11:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet
* 11:44 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
* 11:01 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 11:01 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
* 10:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
* 10:35 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:00 marostegui: Upgrade db1204 to mariadb 10.6 [[phab:T330861|T330861]]
* 08:57 hashar: Fixed up Gerrit > GitHub replication which broke at 5:00 UTC by updating the Github RSA ssh host key [[phab:T332972|T332972]]
* 05:37 hashar: gerrit: refreshed ssh host key for `github.com`
* 05:28 hashar: Restarted Gerrit
* 05:26 hashar: Stopping Gerrit
* 05:26 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) (duration: 00m 10s)
* 05:26 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]])
* 05:22 hashar: Restarting gerrit replica on gerrit2002.wikimedia.org
* 05:21 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) (duration: 00m 07s)
* 05:20 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]])
* 05:17 hashar: Restarting Gerrit for deploying plugins updates
* 05:10 ejegg: Standalone SmashPig upgraded from {{Gerrit|3b84e4cb}} to {{Gerrit|50139e82}}
* 05:04 ejegg: payments-wiki upgraded from {{Gerrit|4d0c90b4}} to {{Gerrit|4b0a71fa}}
* 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply


== 2020-04-10 ==
== 2023-03-23 ==
* 21:12 cdanis@cumin1001: dbctl commit (dc=all): 'db1111 seems overloaded', diff saved to https://phabricator.wikimedia.org/P10954 and previous config saved to /var/cache/conftool/dbconfig/20200410-211202-cdanis.json
* 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:37 cdanis: cdanis@re0.cr1-codfw> clear bfd session address 208.80.153.220
* 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:03 vgutierrez: restart ats-tls on cp1083 and cp1085 - [[phab:T249335|T249335]]
* 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:14 hashar@deploy1001: Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 40s)
* 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:14 hashar@deploy1001: Started deploy [zuul/deploy@4a69913]: (no justification provided)
* 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:12 mutante: restarted and re-armed keyholder on deploy1001 to pick up changes for zuul scap deploy
* 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:12 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:11 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:10 mutante: Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. ([[phab:T249907|T249907]])
* 22:30 mutante: moscovium - rebooting to finalize distro release upgrade - [[phab:T332952|T332952]]
* 12:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 22:20 mutante: moscovium performing apt-get full-upgrade [[phab:T332952|T332952]]
* 12:10 mutante: Creating VM people1002.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=A vcpus=1 memory=2GB disk=80GB link=private. This may take a few minutes.
* 22:09 mutante: moscovium - when doing an in-place upgrade from buster to bullseye and you replace the string in sources.list, you also need to replace "bullseye-updates" with "bullseye-security" in the security.debian.org lines - that this is needed is called a bug at https://shagain.club/index.php/archives/641/ - [[phab:T327068|T327068]]
* 12:10 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:00 mutante: moscovium - apt-get full-upgrade ; apt autoremove ; replace buster with bullseye in sources.list ; repeat apt-get upgrade/full-upgrade etc. (https://wiki.debian.org/DebianUpgrade) [[phab:T327068|T327068]]
* 12:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 22:00 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc2002.codfw.wmnet with OS bullseye
* 12:09 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:57 mutante: moscovium - apt-get upgrade (rt.wikimedia.org going into maintenance) [[phab:T327068|T327068]]
* 11:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'canary' .
* 21:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
* 11:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
* 11:44 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 21:48 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
* 11:39 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 21:45 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10953 and previous config saved to /var/cache/conftool/dbconfig/20200410-094359-marostegui.json
* 21:31 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1089', diff saved to https://phabricator.wikimedia.org/P10952 and previous config saved to /var/cache/conftool/dbconfig/20200410-093129-marostegui.json
* 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:52 hashar@deploy1001: Finished deploy [zuul/deploy@4a69913]: (no justification provided) (duration: 00m 16s)
* 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:51 hashar@deploy1001: Started deploy [zuul/deploy@4a69913]: (no justification provided)
* 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:46 hashar@deploy1001: Finished deploy [zuul/deploy@5a0a03a]: (no justification provided) (duration: 02m 20s)
* 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:44 hashar@deploy1001: Started deploy [zuul/deploy@5a0a03a]: (no justification provided)
* 21:25 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 08:39 mutante: deploy1001 - keyholder disarm, keyholder arm
* 21:24 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 08:32 mutante: fix comment in deployment ssh key for zuul to include the path to the key on deploy1001
* 20:42 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 08:24 vgutierrez: update puppet compiler facts
* 20:42 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 08:20 hashar@deploy1001: Finished deploy [integration/zuul/deploy@6c3ddad]: (no justification provided) (duration: 00m 11s)
* 20:35 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 08:19 hashar@deploy1001: Started deploy [integration/zuul/deploy@6c3ddad]: (no justification provided)
* 20:34 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 08:03 hashar@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 05s)
* 20:33 taavi@deploy2002: Finished scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] (duration: 10m 56s)
* 08:03 hashar@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 20:33 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
* 07:52 mutante: closing port 80 on phab hosts for caching servers
* 20:24 taavi@deploy2002: abi and taavi: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 07:37 ema: cp3050: back to vhtcpd for the holidays [[phab:T249583|T249583]]
* 20:23 taavi@deploy2002: Started scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]]
* 07:00 mutante: sodium - sudo -u mirror ftpsync
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
* 06:58 mutante: armed keyholder on deploy1001
* 19:36 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
* 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 06:00 marostegui: Stop MySQL on pc1008 for upgrade
* 19:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 19:31 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 19:31 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2002
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 19:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 19:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 19:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc2002
* 18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 17:39 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 17:39 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 17:39 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 17:38 mutante: moscovium - systemctl stop rsync
* 17:38 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 17:38 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 17:37 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 17:18 mutante: aphlict1001 - systemctl reset-failed; systemctl start logrotate ; systemctl start logrotate.timer
* 16:59 sukhe: rolling out CR 901333 to A:cp-text [[phab:T313578|T313578]]
* 16:45 sukhe: disable Puppet in A:cp to test and then merge CR 901333
* 16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2002.codfw.wmnet with OS bullseye
* 16:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2002.codfw.wmnet with OS bullseye
* 16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
* 16:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
* 16:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 16:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 16:01 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc1002.wikimedia.org with OS bullseye
* 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
* 15:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
* 15:12 vgutierrez: testing haproxy_2.6.11-1~bpo11+wmf2_amd64.deb in text@ulsfo - [[phab:T332796|T332796]]
* 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc1002.wikimedia.org with OS bullseye
* 14:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
* 14:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host lists1003.wikimedia.org with OS bullseye
* 14:53 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:53 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:51 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:51 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
* 14:45 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
* 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1002.wikimedia.org
* 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
* 14:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host lists1003.wikimedia.org with OS bullseye
* 14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1002.wikimedia.org on all recursors
* 14:24 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc1002.wikimedia.org on all recursors
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
* 14:22 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:22 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host pybal-test2003.codfw.wmnet with OS bullseye
* 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
* 14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
* 14:16 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 14:15 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 14:15 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 14:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc1002.wikimedia.org
* 14:13 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 14:13 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 14:11 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d] (duration: 01m 32s)
* 14:11 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
* 14:10 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d]
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d] (duration: 00m 09s)
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d]
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d] (duration: 05m 10s)
* 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
* 14:03 joal@deploy2002: Started deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d]
* 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
* 13:55 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host pybal-test2003.codfw.wmnet with OS bullseye
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:46 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac] (duration: 01m 28s)
* 13:46 TheresNoTime: close UTC afternoon backport window
* 13:45 samtar@deploy2002: Finished scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] (duration: 07m 46s)
* 13:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac]
* 13:44 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac] (duration: 00m 08s)
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac]
* 13:43 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac] (duration: 13m 06s)
* 13:39 samtar@deploy2002: samtar: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:37 samtar@deploy2002: Started scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]]
* 13:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] (duration: 08m 05s)
* 13:30 joal@deploy2002: Started deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac]
* 13:29 samtar@deploy2002: samtar and sgimeno: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]]
* 13:26 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki ckbwiki --fix` [[phab:T332470|T332470]]
* 13:25 samtar@deploy2002: Finished scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] (duration: 08m 39s)
* 13:18 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:16 samtar@deploy2002: Started scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]]
* 13:15 samtar@deploy2002: Finished scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] (duration: 11m 47s)
* 13:08 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:03 samtar@deploy2002: Started scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]]
* 12:14 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-druid1001.eqiad.wmnet with OS bullseye
* 12:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:58 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:57 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2004.codfw.wmnet with OS bullseye
* 11:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
* 11:47 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache upload cluster - [[phab:T332796|T332796]]
* 11:36 btullis@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-druid1001.eqiad.wmnet with OS bullseye
* 11:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
* 11:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
* 11:26 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc2002.wikimedia.org with OS bullseye
* 11:15 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:15 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2004.codfw.wmnet with OS bullseye
* 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
* 11:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
* 11:05 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 11:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
* 10:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
* 10:44 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc2002.wikimedia.org with OS bullseye
* 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2002.wikimedia.org
* 10:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2005.codfw.wmnet with OS bullseye
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2002.wikimedia.org on all recursors
* 10:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc2002.wikimedia.org on all recursors
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
* 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
* 10:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
* 10:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc2002.wikimedia.org
* 10:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2005.codfw.wmnet with OS bullseye
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
* 09:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
* 09:47 moritzm: uploaded prometheus-druid-exporter 0.8-2 for bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]]
* 08:21 elukey: clean up docker and reboot kubernetes2024 to enable overlay2 - [[phab:T332803|T332803]]
* 08:11 vgutierrez: testing HAProxy 2.6.11 in cp4044 - [[phab:T332796|T332796]]
* 08:08 vgutierrez: fetch haproxy 2.6.11 in apt.wm.o thirdparty/haproxy26 for bullseye & buster
* 08:04 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache text cluster - [[phab:T332796|T332796]]
* 07:54 elukey: clean up docker and reboot kubernetes2023 to enable overlay2 - [[phab:T332803|T332803]]
* 07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
* 07:42 elukey: clean up docker on kubernetes1024 (cordon + stop kubelet + docker + clean /var/lib/docker/*) and reboot to enable overlay2 - [[phab:T332803|T332803]]
* 07:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
* 07:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45928 and previous config saved to /var/cache/conftool/dbconfig/20230323-072315-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45927 and previous config saved to /var/cache/conftool/dbconfig/20230323-070811-root.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45926 and previous config saved to /var/cache/conftool/dbconfig/20230323-065306-root.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45925 and previous config saved to /var/cache/conftool/dbconfig/20230323-063800-root.json
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45924 and previous config saved to /var/cache/conftool/dbconfig/20230323-062255-root.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45923 and previous config saved to /var/cache/conftool/dbconfig/20230323-060750-root.json
* 05:37 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 05:34 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 04:25 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 02:07 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 02:00 mutante: rsyncing ~4GB files for static-codereview.wikimedia.org from old to newer VMs for [[phab:T331896|T331896]] - no automatic sync / deploy for these
* 01:05 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]"
* 01:03 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]"
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 00:57 denisse@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host doc2002.codfw.wmnet with OS bullseye
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 00:27 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
* 00:10 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc1003.eqiad.wmnet with OS bullseye


== 2020-04-09 ==
== 2023-03-22 ==
* 23:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 23:59 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
* 23:27 catrope@deploy1001: Synchronized wmf-config/mobile.php: Drop fallback support for wgMobileFrontendLogo ([[phab:T248500|T248500]]) (duration: 00m 58s)
* 23:56 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
* 23:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Drop unused config for main page CSS ([[phab:T243996|T243996]]) (duration: 00m 58s)
* 23:46 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc1003.eqiad.wmnet with OS bullseye
* 23:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add extendedconfirmed group and protection level on jawiki ([[phab:T249820|T249820]]) (duration: 00m 59s)
* 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
* 22:01 sukhe: running initial metadb sync on cescout1001
* 23:34 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
* 19:43 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:41 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 23:33 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 19:39 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 23:32 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 19:08 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 23:32 zabe: zabe@mwmaint2002:~$ mwscript namespaceDupes.php wikimaniawiki --fix # [[phab:T332782|T332782]]
* 19:01 longma: deploying 1.35.0-wmf.27 to all wikis
* 23:31 zabe@deploy2002: Finished scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] (duration: 10m 03s)
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1003.wikimedia.org
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:24 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:24 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:22 zabe@deploy2002: zabe: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 17:40 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 23:21 zabe@deploy2002: Started scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]]
* 17:24 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 21:15 taavi: UTC late backports complete
* 17:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:13 taavi@deploy2002: Finished scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] (duration: 07m 29s)
* 14:39 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:08 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc1003.eqiad.wmnet
* 14:32 XioNoX: disable down interfaces from fasw-c-codfw (mintaka)
* 21:08 taavi@deploy2002: taavi: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:06 taavi@deploy2002: Started scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]]
* 13:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 21:05 taavi@deploy2002: Finished scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] (duration: 07m 17s)
* 12:43 mlitn@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/MachineVision/: [MachineVision] Fix statement creation from suggestion (duration: 01m 09s)
* 20:59 taavi@deploy2002: taavi: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 12:31 ema: cp3051: upgrade varnish to 5.1.3-1wm13 once again, restart varnish-fe [[phab:T249809|T249809]]
* 20:58 taavi@deploy2002: Started scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]]
* 11:57 XioNoX: offload more traffic from NTT eqiad - [[phab:T249808|T249808]]
* 20:54 samtar@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:900748{{!}}Enable page tools for anonymous users (T331052)]] (duration: 10m 10s)
* 11:20 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}587257{{!}}Enable ContentTranslation as a default tool in Slovenian WP (T248836)]], take II (duration: 01m 06s)
* 20:37 akosiaris: uncordon reboot kubernetes1023. It was drained previously for ⚓ [[phab:T332803|T332803]]
* 11:19 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}587257{{!}}Enable ContentTranslation as a default tool in Slovenian WP (T248836)]] (duration: 01m 07s)
* 20:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] (duration: 11m 47s)
* 10:50 vgutierrez: rolling upgrade to trafficserver 8.0.6-1mw7
* 20:32 akosiaris: reboot kubernetes1023 for a test once more, ⚓ [[phab:T332803|T332803]]
* 10:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:32 akosiaris: reboot kubernetes1023 for a test once more
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 20:28 samtar@deploy2002: samtar and nray: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 10:50 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:25 akosiaris: reboot kubernetes1023 for a test
* 10:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 20:24 samtar@deploy2002: Started scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]]
* 10:43 ema: repool cp3051 [[phab:T249809|T249809]]
* 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] (duration: 09m 57s)
* 10:30 ema: cp3051: re-enable transient storage limit, downgrade varnish to 5.1.3-1wm12 (no 0035-vbf_stp_condfetch_crash.patch) and restart varnish-fe [[phab:T249809|T249809]]
* 20:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists1003.wikimedia.org on all recursors
* 09:46 ema: cp3051: disable transient storage limit and restart varnish-fe [[phab:T249809|T249809]]
* 20:15 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache lists1003.wikimedia.org on all recursors
* 09:31 XioNoX: offload traffic from NTT eqiad - [[phab:T249808|T249808]]
* 20:15 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 07:56 mutante: contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3  (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) ([[phab:T224591|T224591]])
* 20:15 samtar@deploy2002: kharlan and samtar: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:56 mutante: contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3  (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206)
* 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]]
* 07:24 moritzm: synched jenkins 222.1 to apt.wikimedia.org (buster-wikimedia, thirdparty/ci) [[phab:T224591|T224591]]
* 20:12 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.eqiad.wmnet on all recursors
* 07:12 marostegui: Repool labsdb1011
* 20:11 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.eqiad.wmnet on all recursors
* 07:10 XioNoX: switch urpf from log to syslog in ulsfo
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:04 XioNoX: re-activate BGP to Zayo in eqiad
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
* 06:59 vgutierrez: upgrade ats to version 8.0.6-1wm7 in cp[4026,4032,5006,5012]
* 20:10 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
* 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] (duration: 07m 22s)
* 06:43 XioNoX: confirmed on one host that the change didn't break logstash. Re-enable Puppet on logstash hosts - [[phab:T244147|T244147]]
* 20:07 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 20:07 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.eqiad.wmnet
* 06:36 XioNoX: disabling puppet on logstash host for CR deploy - [[phab:T244147|T244147]]
* 20:07 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 06:30 XioNoX: push urpf log only to eqiad - [[phab:T244147|T244147]]
* 20:07 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1003.wikimedia.org
* 06:25 XioNoX: push urpf log only to eqsin - [[phab:T244147|T244147]]
* 20:06 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doc1003.wikimedia.org
* 06:21 XioNoX: push urpf log only to AMS - [[phab:T244147|T244147]]
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
* 05:40 vgutierrez: upgrade ats to version 8.0.6-1wm6 in cp[4025,4031,5005,5011] - [[phab:T249335|T249335]]
* 20:06 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
* 05:37 marostegui: Stop MySQL on pc2008 for upgrade to Buster and 10.4
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 01m 08s)
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 05:08 marostegui: Deploy schema change on db1123
* 20:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
* 05:07 vgutierrez: upload trafficserver 8.0.6-1wm6 to apt.wm.o (buster) - [[phab:T249335|T249335]]
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
* 20:05 denisse@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:04 samtar@deploy2002: samtar and matmarex: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:02 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0 (duration: 00m 21s)
* 20:02 samtar@deploy2002: Started scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]]
* 20:02 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0
* 20:01 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 20:01 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.wikimedia.org
* 18:16 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 18:12 mutante: rsyncing /srv/org/wikimedia/sitemaps files for https://sitemaps.wikimedia.org from old to new machines. most other things are auto-deployed by puppet or puppet running intial scap or automatic rsync.. this is not. rsync -av /srv/org/wikimedia/sitemaps/ rsync://miscweb2003.codfw.wmnet/miscapps-srv/org/wikimedia/sitemaps/ [[phab:T331896|T331896]] - but also see [[phab:T332101|T332101]]
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dborch1002.wikimedia.org
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
* 17:38 _joe_: stopping apache on mwdebug1001 to test the new envoy error page
* 17:15 hashar@deploy2002: Synchronized composer.json: build: add local typos check to composer.json # [[phab:T332121|T332121]] (duration: 06m 44s)
* 17:12 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
* 17:09 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 17:06 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 17:06 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 17:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts dborch1002.wikimedia.org
* 17:05 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 17:04 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 16:45 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided) (duration: 00m 12s)
* 16:45 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided)
* 16:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 16:37 eoghan@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
* 16:37 eoghan@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
* 16:35 vgutierrez: rolling downgrade to HAProxy 2.6.9 in text@esams - [[phab:T332796|T332796]]
* 16:24 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
* 16:19 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
* 16:18 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 16:18 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 15:58 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dborch1001.wikimedia.org with OS bullseye
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
* 15:53 moritzm: uploaded druid 0.19.wmf0-2 to bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]]
* 15:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
* 15:46 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
* 15:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
* 15:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
* 15:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
* 15:39 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:31 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:30 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1001.wikimedia.org with OS bullseye
* 15:27 elukey: `racadm racreset` for kafka-main2004 (no http idrac available for the cookbook, ssh one available)
* 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:26 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
* 15:25 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:25 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
* 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 15:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 15:22 hnowlan: removing java packages from maps hosts
* 15:17 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 15:17 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 15:13 hnowlan: removing cassandra packages from maps hosts
* 15:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 14:57 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:57 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 14:54 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:53 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:21 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45917 and previous config saved to /var/cache/conftool/dbconfig/20230322-141923-root.json
* 14:17 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 14:17 sukhe: enable Puppet on A:wikidough to roll out dnsdist.conf change
* 14:13 sukhe: disable Puppet on A:wikidough to roll out dnsdist.conf change
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45916 and previous config saved to /var/cache/conftool/dbconfig/20230322-140418-root.json
* 14:02 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45915 and previous config saved to /var/cache/conftool/dbconfig/20230322-134913-root.json
* 13:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45914 and previous config saved to /var/cache/conftool/dbconfig/20230322-133409-root.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45913 and previous config saved to /var/cache/conftool/dbconfig/20230322-131904-root.json
* 13:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG (duration: 00m 12s)
* 13:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG
* 13:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:05 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45912 and previous config saved to /var/cache/conftool/dbconfig/20230322-130359-root.json
* 13:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:44 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 12:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 12:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 12:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:30 marostegui: Poweroff db1121 (lag will show on wikireplicas for s4 section) [[phab:T323961|T323961]]
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool needs to be rebooted [[phab:T323961|T323961]]', diff saved to https://phabricator.wikimedia.org/P45910 and previous config saved to /var/cache/conftool/dbconfig/20230322-112031-root.json
* 11:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 11:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 11:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 11:15 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
* 11:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 11:09 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:02 jbond: upgrader prometheus-ipmi-exporter on buster and bullseye
* 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:36 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:34 elukey: `racadm racreset` for kafka-main2005 - http idrac not available (ssh on works fine)
* 10:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:29 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:26 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 10:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1004.eqiad.wmnet with OS bullseye
* 10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
* 09:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
* 09:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1004.eqiad.wmnet with OS bullseye
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
* 09:27 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
* 09:23 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
* 09:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
* 09:11 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:10 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:02 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 09:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
* 08:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
* 08:52 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 08:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:25 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:24 XioNoX: deploy measure-$site.wikimedia.org CNAMES
* 08:20 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 08:20 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 08:18 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 08:17 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 07:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 141082
* 07:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 141082
* 00:57 zabe@deploy2002: Finished scap: update interwiki cache (duration: 07m 02s)
* 00:50 zabe@deploy2002: Started scap: update interwiki cache
* 00:47 zabe@deploy2002: Finished scap: [[phab:T332115|T332115]] (duration: 06m 56s)
* 00:40 zabe@deploy2002: Started scap: [[phab:T332115|T332115]]
* 00:40 zabe: create Wikipedia Angika (anpwiki) # [[phab:T332115|T332115]]
* 00:38 zabe@deploy2002: Finished scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] (duration: 27m 00s)
* 00:29 zabe@deploy2002: zabe: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 00:11 zabe@deploy2002: Started scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]]


== 2020-04-08 ==
== 2023-03-21 ==
* 21:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TemplateData/includes/TemplateDataHooks.php: Restore call to OutputPage::setupOOUI() (duration: 01m 07s)
* 23:46 zabe@deploy2002: Finished scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] (duration: 30m 08s)
* 21:19 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TemplateData/includes/TemplateDataHooks.php: Restore call to OutputPage::setupOOUI() (duration: 01m 09s)
* 23:35 zabe@deploy2002: zabe: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 23:15 zabe@deploy2002: Started scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]]
* 20:09 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 23:07 zabe@deploy2002: Finished scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]] (duration: 07m 10s)
* 20:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 23:00 zabe@deploy2002: Started scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]]
* 20:06 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 22:47 ejegg: payments-wiki upgraded from {{Gerrit|0fd66b1f}} to {{Gerrit|ab0a55a2}}
* 20:04 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 22:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] (duration: 07m 15s)
* 20:04 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 22:04 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 19:51 gehel: restart wdqs-updater after deployment
* 22:03 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]]
* 19:49 mstyles@deploy1001: Finished deploy [wdqs/wdqs@c2995eb]: WDQS version 0.3.21 (duration: 14m 37s)
* 21:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 19:44 dpifke@deploy1001: Finished deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091 (duration: 00m 05s)
* 21:21 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 19:44 dpifke@deploy1001: Started deploy [performance/navtiming@4acb04d]: Deploy new navtiming with First Input Delay metric https://phabricator.wikimedia.org/T238091
* 21:02 AndyRussG: update SmashPig  config {{Gerrit|6e651fd4}} -> {{Gerrit|035f602a}}
* 19:35 mstyles@deploy1001: Started deploy [wdqs/wdqs@c2995eb]: WDQS version 0.3.21
* 20:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]] (duration: 01m 06s)
* 20:48 taavi: start [[phab:T315510|T315510]] migration script on group2 s7 wikis
* 19:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 20:39 taavi@deploy2002: Finished scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] (duration: 09m 01s)
* 19:02 longma: deploying 1.35.0-wmf.27 to group1
* 20:31 taavi@deploy2002: matmarex and taavi: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 18:37 jforrester@deploy1001: Synchronized php-1.35.0-wmf.27/skins/Vector: [[phab:T248761|T248761]]: Revert moving indicators in DOM (duration: 01m 07s)
* 20:30 taavi@deploy2002: Started scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]]
* 18:17 reedy@deploy1001: Synchronized php-1.35.0-wmf.27/extensions/TemplateData/includes/TemplateDataHooks.php: [[phab:T236809|T236809]] (duration: 01m 06s)
* 20:20 taavi@deploy2002: Finished scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] (duration: 17m 40s)
* 18:16 reedy@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TemplateData/includes/TemplateDataHooks.php: [[phab:T236809|T236809]] (duration: 01m 10s)
* 20:10 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:09 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 17:23 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 20:04 taavi@deploy2002: esanders and taavi and matmarex: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:02 taavi@deploy2002: Started scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]]
* 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:52 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 16:16 ema: cache_upload: rolling varnish-fe restarts to bump transient storage limit [[phab:T185968|T185968]]
* 19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 15:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:43 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 15:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:41 jhathaway@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host dborch1002.wikimedia.org with OS bullseye
* 15:11 ema: cp3051: param.set shortlived=0 to try ease pressure on transient memory
* 19:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P10947 and previous config saved to /var/cache/conftool/dbconfig/20200408-142341-marostegui.json
* 19:09 dancy@deploy2002: Installation of scap version "4.47.1" completed for 587 hosts
* 14:14 jeh@deploy1001: Finished deploy [horizon/deploy@0d18f67]: update horizon submodule to enable server groups (duration: 03m 30s)
* 19:07 dancy@deploy2002: Installing scap version "4.47.1" for 587 hosts
* 14:10 jeh@deploy1001: Started deploy [horizon/deploy@0d18f67]: update horizon submodule to enable server groups
* 19:04 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
* 13:40 mutante: stopped and masked zuul-merger service on contint2001 via puppet ([[phab:T224591|T224591]])
* 19:03 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag (duration: 00m 14s)
* 13:30 ema: cp3050: stop vhtcpd, start purged [[phab:T249583|T249583]]
* 19:03 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag
* 13:22 vgutierrez: enable inbound TLSv1.3 in text@ulsfo - [[phab:T170567|T170567]]
* 19:01 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
* 13:05 ema: purged 0.1 uploaded to buster-wikimedia [[phab:T249583|T249583]]
* 18:52 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1002.wikimedia.org with OS bullseye
* 12:31 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 07s)
* 18:38 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 12:29 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585219{{!}}Enable GrowthExperiments suggested edits on uk, hu, hy, eu wikipedias (T247308)]] (duration: 01m 08s)
* 18:36 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* {{safesubst:SAL entry|1=12:17 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584135{{!}}Enable GrowthExperiments welcome survey on Ukrainian, Hungarian, Armenian Wikipedias (T238295) (duration: 01m 08s)}}
* 18:00 AndyRussG: update SmashPig config {{Gerrit|59a8b2d2}} -> {{Gerrit|6e651fd}}
* 12:09 tgr@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:584183{{!}}Enable GrowthExperiments on French Wiktionary (T235964)]] (duration: 01m 06s)
* 17:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dborch1002.wikimedia.org
* 11:56 tgr@deploy1001: Synchronized dblists/: SWAT: [[gerrit:584183{{!}}Enable GrowthExperiments on French Wiktionary (T235964)]] (duration: 01m 03s)
* 17:40 joal@deploy2002: Finished deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b] (duration: 00m 11s)
* 11:48 mutante: logstash1009 - restarted logstash
* 17:39 joal@deploy2002: Started deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b]
* 11:43 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585766{{!}}Enable WikibaseQualityConstraints on test commons (T248117)]] (duration: 01m 05s)
* 17:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-client1002.eqiad.wmnet
* 11:43 marostegui: Deploy schema change on db1112, this will generate lag on labs s3
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P10942 and previous config saved to /var/cache/conftool/dbconfig/20200408-114315-marostegui.json
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change', diff saved to https://phabricator.wikimedia.org/P10941 and previous config saved to /var/cache/conftool/dbconfig/20200408-113901-marostegui.json
* 16:53 mutante: sudo cumin -b 4 -s 40 'C:role::cache::text' 'run-puppet-agent'
* 11:29 tgr@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:584133{{!}}Deploy GrowthExperiments on Serbian Wikipedia (T241181)]] (duration: 01m 06s)
* 16:50 jbond: copy /usr/bin/prometheus-ipmi-exporter from bullseye to buster
* 11:28 tgr@deploy1001: Synchronized dblists/: SWAT: [[gerrit:584133{{!}}Deploy GrowthExperiments on Serbian Wikipedia (T241181)]] (duration: 01m 17s)
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 11:05 XioNoX: push urpf log only to codfw - [[phab:T244147|T244147]]
* 16:46 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 10:39 jbond42: restarting idp.wikimedia.org
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:14 marostegui: Deploy schema change on db1078
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P10940 and previous config saved to /var/cache/conftool/dbconfig/20200408-101431-marostegui.json
* 16:45 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
* 09:30 jynus: stopping and removing db1095:s8 instance
* 16:43 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 09:20 godog: upgrade grafana on cloudmetrics hosts - [[phab:T244208|T244208]]
* 16:43 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change', diff saved to https://phabricator.wikimedia.org/P10939 and previous config saved to /var/cache/conftool/dbconfig/20200408-091728-marostegui.json
* 16:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 09:11 gehel: setting weight=10 for all pooled wdqs servers in codfw - [[phab:T246343|T246343]]
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 09:10 marostegui: Reload proxies on dbproxy1018 and dbproxy1019 to depool labsdb1011 - [[phab:T249188|T249188]] [[phab:T248592|T248592]]
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 09:07 gehel: pooling wdqs200[78] - new servers ready to go! - [[phab:T246343|T246343]]
* 16:28 jbond: upload prometheus-ipmi-exporter_1.6.1 to bullseye
* 08:46 marostegui: Rename wb_terms and recreate views on labsdb1009-labsdb1011 - [[phab:T248592|T248592]] [[phab:T248086|T248086]]
* 16:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-test-client1002.eqiad.wmnet on all recursors
* 08:39 godog: upgrade grafana on grafana1002 - [[phab:T244208|T244208]]
* 16:15 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-test-client1002.eqiad.wmnet on all recursors
* 08:17 _joe_: switching parsoid to envoy (take 2) in eqiad
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:23 marostegui: Deploy schema change on db1075
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P10937 and previous config saved to /var/cache/conftool/dbconfig/20200408-072331-marostegui.json
* 16:13 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
* 06:31 marostegui: Deploy schema change on db1095:3313
* 16:10 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
* 06:11 marostegui: Stop haproxy on dbproxy1011 - [[phab:T231520|T231520]]
* 16:10 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-client1002.eqiad.wmnet
* 05:44 vgutierrez: rolling upgrade ATS to 8.0.6-1wm6 in cp[5006,5012,3065,3064,2042,2041,1090,1089]
* 15:57 jynus: running from cumin1001: transfer.py --type=decompress dbprov1003.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s5.2023-03-20--04-00-30.tar.gz db1145.eqiad.wmnet:/srv/sqldata.s5
* 05:34 marostegui: Deploy schema change on dbstore1004:3313
* 15:53 jhathaway@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.wikimedia.org
* 05:33 _joe_: repooling wtp1025, with envoy and logging any error above 404 [[phab:T249535|T249535]]
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 04:36 vgutierrez: rolling restart of ats-tls - [[phab:T249335|T249335]]
* 15:53 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 15:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 15:52 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:51 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:42 jbond: stop puppet from deploying this further
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
* 15:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
* 15:26 samtar@deploy2002: Finished scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] (duration: 09m 11s)
* 15:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:19 samtar@deploy2002: samtar: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:17 samtar@deploy2002: Started scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]]
* 15:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:10 samtar@deploy2002: Finished scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] (duration: 09m 32s)
* 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 15:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:02 samtar@deploy2002: samtar: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:02 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 15:00 samtar@deploy2002: Started scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]]
* 14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 14:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 14:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=kartotherian,name=maps1005.eqiad.wmnet
* 14:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=maps1005.eqiad.wmnet
* 14:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 14:38 hnowlan: disabling puppet on maps* before merging 760619
* 14:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:27 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:17 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:14 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] (duration: 07m 53s)
* 14:10 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:02 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]]
* 14:00 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:58 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 13:40 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:33 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:28 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:25 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:21 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:16 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 13:05 elukey: move kafka mirror maker instances to PKI migration settings (new truststores) - [[phab:T319372|T319372]]
* 11:20 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 11:09 joal: Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00
* 11:08 joal: Kill mediacounts_load oozie job
* 11:07 joal: Unpause mediawiki_history_denormalize airflow job
* 11:06 joal: Kill mediawiki_denormalize oozie job
* 11:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s)
* 11:04 joal@deploy2002: Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b]
* 10:43 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:32 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:24 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s)
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9]
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s)
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9]
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s)
* 10:14 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9]
* 09:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
* 09:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
* 09:25 phedenskog@deploy2002: Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s)
* 09:25 phedenskog@deploy2002: Started deploy [performance/navtiming@d2b97ad]: (no justification provided)
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 08:31 elukey: move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - [[phab:T319372|T319372]]
* 06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150
* 06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150
* 03:57 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.26 (duration: 02m 18s)
* 03:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]] (duration: 52m 38s)
* 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]


== 2020-04-07 ==
== 2023-03-20 ==
* 20:39 andrewbogott: correction: briefly downtiming ldap-eqiad-replica0 and ldap-eqiad-replica1.  I'm trying to investigate a possible split-brain so going to turn ldap off on one, and then the other, to see if behavior changes
* 22:00 samtar@deploy2002: Finished scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] (duration: 09m 45s)
* 20:37 andrewbogott: briefly downtiming serpens and seaborgium.  I'm trying to investigate a possible split-brain so going to turn ldap off on one, and then the other, to see if behavior changes
* 21:52 samtar@deploy2002: jdlrobson and samtar: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:34 hoo: (Take 3) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 21:50 samtar@deploy2002: Started scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]]
* 20:17 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.27  refs [[phab:T247774|T247774]]
* 21:34 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki shwiki --fix` [[phab:T332614|T332614]]
* 20:09 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.27 (duration: 60m 34s)
* 21:25 TheresNoTime: closing UTC late backport window, extended
* 20:08 hoo: (Take 2) Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 21:22 samtar@deploy2002: Finished scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] (duration: 12m 22s)
* 19:45 hoo: Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata RDF dumps start now (broke as a side effect of [[phab:T249565|T249565]])
* 21:11 samtar@deploy2002: samtar and aleksandar: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 19:13 XioNoX: push pfw firewall rules - [[phab:T249650|T249650]]
* 21:10 samtar@deploy2002: Started scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]]
* 19:08 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.27
* 21:09 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer (duration: 00m 13s)
* 18:48 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.24 (duration: 12m 44s)
* 21:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer
* 17:56 herron: increasing codfw.mediawiki.job.cirrusSearchElasticaWrite to 3 partitions [[phab:T240702|T240702]]
* 21:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] (duration: 08m 34s)
* 17:55 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) retry (duration: 01m 02s)
* 21:02 samtar@deploy2002: matmarex and samtar: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 17:54 addshore: last sync stuck on sync-masters
* 21:00 TheresNoTime: extending UTC late backport window
* 17:54 addshore@deploy1001: sync-file aborted: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (14.5/14.5h) (duration: 01m 16s)
* 21:00 samtar@deploy2002: Started scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]]
* 17:49 ppchelko@deploy1001: Started restart [cpjobqueue/deploy@83c93d1]: Try to make it notice new partitions [[phab:T240702|T240702]]
* 20:58 kharlan@deploy2002: Finished scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] (duration: 10m 28s)
* 17:40 herron: increasing eqiad.mediawiki.job.cirrusSearchElasticaWrite to 3 partitions [[phab:T240702|T240702]]
* 20:49 kharlan@deploy2002: kharlan: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmn
* 16:24 longma: 1.35.0-wmf.27 was branched at {{Gerrit|e76ac29cd9c57bed4097ec8a4ea8311fb55fd967}} for [[phab:T247774|T247774]]
* 20:47 kharlan@deploy2002: Started scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]]
* 16:16 hashar: restarting CI jenkins
* 19:49 mutante: miscweb1003 - manually edit /srv/deployment/iegreview/iegreview-cache/.config and replace tin.eqiad.wmnet with deployment.eqiad.wmnet (which is an alias for deploy2002.codfw.wmnet) [[phab:T257317|T257317]] [[phab:T332623|T332623]] [[phab:T331896|T331896]]
* 15:53 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 19:13 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator (duration: 00m 13s)
* 15:21 moritzm: installing idp-test2001
* 19:13 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator
* 15:20 XioNoX: enable uRPF loose mode (log only) on cr4-ulsfo - [[phab:T244147|T244147]]
* 18:56 ejegg: switched back to new PayPal pending transaction resolver
* 15:17 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (12/14.5h) (duration: 01m 00s)
* 18:48 akosiaris@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 28s)
* 15:10 ema: cp3052: stop purged, start vhtcpd [[phab:T249583|T249583]] [[phab:T241232|T241232]]
* 18:47 akosiaris: emergency rollover of redis password complete
* 15:00 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 18:45 akosiaris: re-enable puppet on rdb*, netbox*, ores*, registry*
* 14:56 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (10/14.5h) (duration: 00m 55s)
* 18:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script (duration: 00m 13s)
* 14:52 jeh: cloudvirt2003-dev: downtime in icinga and reboot to enable BIOS virtualization support [[phab:T249453|T249453]]
* 18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script
* 14:38 ema: cp3052: stop vhtcpd, start purged [[phab:T249583|T249583]]
* 18:42 ejegg: civicrm upgraded from {{Gerrit|3d3606f1}} to {{Gerrit|09373b9d}}
* 14:35 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (8/14.5h) (duration: 00m 58s)
* 18:32 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 14:25 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (4/14.5h) (duration: 00m 58s)
* 18:32 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 14:15 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (2/14.5h) (duration: 00m 58s)
* 18:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 14:08 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) take 2 (duration: 00m 57s)
* 18:32 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 13:57 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: REVERT [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 58s)
* 18:31 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 13:55 addshore@deploy1001: sync-file aborted: [[phab:T249565|T249565]] [[phab:T249595|T249595]] RejectParserCacheValue entries during wb_items_per_site drop incident (1h) (duration: 00m 29s)
* 18:30 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 13:17 vgutierrez: restart ats-tls on cp3056 - [[phab:T249335|T249335]]
* 18:30 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 12:59 vgutierrez: restart ats-tls on cp3052- [[phab:T249335|T249335]]
* 18:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 12:50 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-6.list > [[phab:T249596|T249596]]-6.out # [[phab:T249565|T249565]]
* 18:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 12:43 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-5.list > [[phab:T249596|T249596]]-5.out # [[phab:T249565|T249565]]
* 18:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 12:42 vgutierrez: restart ats-tls on cp3058 - [[phab:T249335|T249335]]
* 18:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 12:25 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 12:06 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki --file [[phab:T249596|T249596]]-4.list > [[phab:T249596|T249596]]-4.out # [[phab:T249565|T249565]] [[phab:T249596|T249596]]
* 18:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 12:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 18:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'repool db1126', diff saved to https://phabricator.wikimedia.org/P10932 and previous config saved to /var/cache/conftool/dbconfig/20200407-115228-marostegui.json
* 18:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'depool db1126', diff saved to https://phabricator.wikimedia.org/P10931 and previous config saved to /var/cache/conftool/dbconfig/20200407-115154-marostegui.json
* 18:16 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092, db1111, db1099:3318 after table rename', diff saved to https://phabricator.wikimedia.org/P10930 and previous config saved to /var/cache/conftool/dbconfig/20200407-115058-marostegui.json
* 18:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 11:50 jynus: renaming wb_items_per_site_recovered to wb_items_per_site on s8
* 18:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 11:45 jynus: stopping s8 replication on db1116:3318, db1095:3318, db2079
* 18:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092, db1111, db1099:3318 for table rename', diff saved to https://phabricator.wikimedia.org/P10929 and previous config saved to /var/cache/conftool/dbconfig/20200407-114258-marostegui.json
* 18:15 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 11:36 Amir1: stopped the rebuilt script ([[phab:T249565|T249565]])
* 18:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 11:34 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: cleanup [[phab:T203888|T203888]], Remove old unused RejectParserCacheValue hook (duration: 00m 59s)
* 18:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 11:09 marostegui: Deploy schema change on s3 codfw
* 18:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 11:07 jynus: starting recovery on all s8 hosts
* 18:11 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 10:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 18:11 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 10:41 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php: [[phab:T249565|T249565]] [[phab:T249596|T249596]] Wikibase rebuildItemsPerSite.php script that allows lists of ids (duration: 01m 00s)
* 18:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 10:27 jynus: starting recovery on db1099:3318
* 18:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 after schema change', diff saved to https://phabricator.wikimedia.org/P10927 and previous config saved to /var/cache/conftool/dbconfig/20200407-095852-marostegui.json
* 18:05 mutante: miscweb1003 - syntax error in httpd config due to "Unknown Authn provider: ldap" - comes from static-rt vhost ([[phab:T331896|T331896]])
* 09:49 volans@deploy1001: Finished deploy [homer/deploy@887544c]: Release v0.2.0 (take 2) (duration: 00m 26s)
* 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet
* 09:49 volans@deploy1001: Started deploy [homer/deploy@887544c]: Release v0.2.0 (take 2)
* 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet
* 09:38 marostegui: Deploy schema change on db1119
* 17:59 mutante: when applying apache role for the first time on new hosts we still have the same old conflict:  miscweb1003 - manual "a2dismod mpm_event" to be able to let puppet enable mod PHP ([[phab:T196968|T196968]])
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P10926 and previous config saved to /var/cache/conftool/dbconfig/20200407-093820-marostegui.json
* 17:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134 after schema change', diff saved to https://phabricator.wikimedia.org/P10925 and previous config saved to /var/cache/conftool/dbconfig/20200407-093638-marostegui.json
* 17:57 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
* 09:31 volans@deploy1001: Finished deploy [homer/deploy@b4522ad]: Release v0.2.0 (duration: 00m 16s)
* 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 09:31 volans@deploy1001: Started deploy [homer/deploy@b4522ad]: Release v0.2.0
* 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 09:29 volans@deploy1001: Finished deploy [homer/deploy@ac7a818]: Inject plugins (take 3) (duration: 03m 03s)
* 17:26 akosiaris: disable puppet on rdb*, netbox*, ores*, registry*
* 09:26 volans@deploy1001: Started deploy [homer/deploy@ac7a818]: Inject plugins (take 3)
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
* 09:19 marostegui: Deploy schema change on db1134
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P10924 and previous config saved to /var/cache/conftool/dbconfig/20200407-091847-marostegui.json
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 09:17 volans@deploy1001: Finished deploy [homer/deploy@a03d7cd]: Inject plugins (take 2) (duration: 00m 29s)
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 09:17 volans@deploy1001: Started deploy [homer/deploy@a03d7cd]: Inject plugins (take 2)
* 16:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 09:04 vgutierrez: testing ATS 8.0.6-1wm6 on cp4026 and cp4032
* 16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 08:58 volans@deploy1001: Finished deploy [homer/deploy@a03d7cd]: Inject plugins (duration: 04m 59s)
* 16:36 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 08:53 volans@deploy1001: Started deploy [homer/deploy@a03d7cd]: Inject plugins
* 16:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 08:46 XioNoX: enable uRPF loose mode (log only) on cr3-ulsfo v4 uplinks - [[phab:T244147|T244147]]
* 16:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 08:44 XioNoX: enable uRPF loose mode (log only) on cr3-ulsfo v6 uplinks - [[phab:T244147|T244147]]
* 16:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 08:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:21 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 08:37 mutante: decom ganeti VM miscweb1001 (stretch) - kept backup of old racktables files and db dump in /root/racktables on miscweb1002 ([[phab:T247648|T247648]])
* 16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 08:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 15:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 08:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:30 mutante: decom ganeti VM miscweb2001 (stretch)
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after schema change', diff saved to https://phabricator.wikimedia.org/P10923 and previous config saved to /var/cache/conftool/dbconfig/20200407-082607-marostegui.json
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:17 moritzm: installing php5 security updates
* 14:53 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 08:06 marostegui: Deploy schema change on db1106 (this will generate lag on s1 labs)
* 14:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for schema change', diff saved to https://phabricator.wikimedia.org/P10922 and previous config saved to /var/cache/conftool/dbconfig/20200407-080533-marostegui.json
* 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2552
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1080 after schema change', diff saved to https://phabricator.wikimedia.org/P10921 and previous config saved to /var/cache/conftool/dbconfig/20200407-080443-marostegui.json
* 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2552
* 07:52 _joe_: disabling puppet on mwdebug1002
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:47 marostegui: Failover dbproxy1011 to dbproxy1019 - [[phab:T231520|T231520]])
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:43 marostegui: Deploy schema change on db1080
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 and promote es2027 to es3 master', diff saved to https://phabricator.wikimedia.org/P45896 and previous config saved to /var/cache/conftool/dbconfig/20230320-143951-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for schema change', diff saved to https://phabricator.wikimedia.org/P10920 and previous config saved to /var/cache/conftool/dbconfig/20200407-074321-marostegui.json
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:41 dcausse@deploy1001: Finished deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs2002: [[phab:T249196|T249196]] (duration: 01m 28s)
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:40 dcausse@deploy1001: Started deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs2002: [[phab:T249196|T249196]]
* 14:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]]
* 07:39 _joe_: depooling wtp1025, used for debugging
* 14:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]]
* 07:31 vgutierrez: enable parent proxies in ats-tls - [[phab:T249335|T249335]]
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:19 jynus: restarting s3 on db1095
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:02 moritzm: updating linux-image-4.9.0-11-amd64 where applicable
* 14:17 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 06:55 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:11 TheresNoTime: close UTC afternoon backport window
* 06:53 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
* 06:52 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
* 06:37 moritzm: installing ruby2.1 security updates
* 14:08 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autopatrol' 'autopatrolled'` [[phab:T331762|T331762]]
* 06:32 jynus: stopping slave (s3) on db1095
* 14:06 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 05:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:586488{{!}}Fix database name for repo in testwikidata (T249533)]], take II (duration: 00m 58s)
* 14:05 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreview' 'autopatrol'` [[phab:T331762|T331762]]
* 05:37 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:586488{{!}}Fix database name for repo in testwikidata (T249533)]] (duration: 01m 00s)
* 14:03 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki slwiki --fix` [[phab:T332351|T332351]]
* 05:26 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'reviewer' 'patrol'` [[phab:T331762|T331762]]
* 01:08 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/maintenance/: [[phab:T157651|T157651]] Remove sql.php from maintenance/ (duration: 00m 58s)
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreviewer' 'autopatrol'` ("nothing to do") [[phab:T331762|T331762]]
* 01:06 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/autoload.php: [[phab:T157651|T157651]] Remove sql.php from autoloader (duration: 00m 58s)
* 14:00 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki ptwikisource editor` [[phab:T331762|T331762]]
* 01:05 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: [[phab:T208425|T208425]] [[phab:T249565|T249565]] Follow-up {{Gerrit|a956c655}}: Only avoid dropping wb_items_per_site so prod can be merged (duration: 00m 58s)
* 13:58 samtar@deploy2002: Finished scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] (duration: 09m 44s)
* 00:01 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table [[phab:T208425|T208425]] [[phab:T249565|T249565]] cache bust (duration: 01m 01s)
* 13:50 samtar@deploy2002: thiemowmde and samtar and zoranzoki21: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:49 samtar@deploy2002: Started scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]]
* 13:47 samtar@deploy2002: Finished scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] (duration: 09m 26s)
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host cuminunpriv1001.eqiad.wmnet with OS bullseye
* 13:39 samtar@deploy2002: aleksandar and samtar: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:38 samtar@deploy2002: Started scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]]
* 13:37 samtar@deploy2002: Finished scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] (duration: 08m 46s)
* 13:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
* 13:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
* 13:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
* 13:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
* 13:30 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 30s)
* 13:29 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
* 13:29 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}}
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]]
* 13:28 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:26 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 39s)
* 13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
* 13:24 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}}
* 13:18 samtar@deploy2002: Finished scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] (duration: 11m 36s)
* 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host cuminunpriv1001.eqiad.wmnet with OS bullseye
* 13:17 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 13:17 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 13:14 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:14 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 13:08 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:06 samtar@deploy2002: Started scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]]
* 11:35 krinkle@deploy2002: Synchronized php-1.40.0-wmf.27/includes/libs/rdbms/: (no justification provided) (duration: 15m 28s)
* 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
* 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082
* 09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141082
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58655
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58655
* 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2552
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2552
* 09:21 claime: Repooling parse2004 - [[phab:T332119|T332119]]
* 08:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 138915
* 08:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 138915
* 08:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138915
* 08:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138915


== 2020-04-06 ==
== 2023-03-19 ==
* 23:59 addshore@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table [[phab:T208425|T208425]] [[phab:T249565|T249565]] (duration: 00m 59s)
* 18:27 AndyRussG: update config (to re-enable old PayPal orphan slayer job) {{Gerrit|27a5b481}} -> {{Gerrit|6359222d}}
* 23:31 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki-staging/php-1.35.0-wmf.26$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki
* 16:44 apergos: dumpsdata1005 conversion to primary dumps nfs server done
* 23:26 Amir1: created wb_items_per_site
* 15:12 AndyRussG: update config (to disable paypal_ec pending transaction resolver) {{Gerrit|5dd37c9c}} -> {{Gerrit|3d3606f1}}
* 19:05 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:18 apergos: work starting now to swap dumpsdata1005 in for primary nfs server, replacing dumpsdata1003 which will become dumps spare host
* 19:03 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:17 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
* 19:00 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 00:17 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 18:58 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:57 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 18:51 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:42 elukey@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:22 Urbanecm: Morning SWAT done
* 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|335a924}}: Enable Local upload on azbwiki ([[phab:T248971|T248971]]; take II) (duration: 00m 58s)
* 18:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|335a924}}: Enable Local upload on azbwiki ([[phab:T248971|T248971]]) (duration: 00m 59s)
* 16:54 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:52 _joe_: parsoid migrated to use envoy for TLS termination
* 16:24 _joe_: switching parsoid-php to envoy for TLS termination
* 15:45 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Label blacklist updates ([[phab:T249285|T249285]]) (duration: 00m 58s)
* 15:36 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:04 elukey@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 14:59 addshore: deploy slot done
* 14:55 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration [[phab:T248664|T248664]] (cache bust) (duration: 00m 57s)
* 14:54 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration [[phab:T248664|T248664]] (duration: 00m 57s)
* 14:50 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix [[phab:T248664|T248664]] (cache bust) (duration: 00m 57s)
* 14:49 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix [[phab:T248664|T248664]] (duration: 00m 58s)
* 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10912 and previous config saved to /var/cache/conftool/dbconfig/20200406-144220-marostegui.json
* 14:41 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config [[phab:T248664|T248664]] (cache bust) (duration: 00m 58s)
* 14:40 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config [[phab:T248664|T248664]] (duration: 00m 59s)
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10911 and previous config saved to /var/cache/conftool/dbconfig/20200406-143755-marostegui.json
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10910 and previous config saved to /var/cache/conftool/dbconfig/20200406-143042-marostegui.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10909 and previous config saved to /var/cache/conftool/dbconfig/20200406-142607-marostegui.json
* 14:24 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki [[phab:T248664|T248664]] (cachebust) (duration: 00m 58s)
* 14:23 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki [[phab:T248664|T248664]] (duration: 00m 59s)
* 14:09 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:07 elukey@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 14:07 elukey@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:47 sukhe: upload cescout 0.1.1-1 to apt.wm.o (buster) - [[phab:T247273|T247273]]
* 13:26 elukey: reboot stat1008 as test to verify ROCm 3.3 upgrades
* 13:22 elukey: stat1008 upgraded to ROCm 3.3 (enables Tensorflow 2.x)
* 13:05 ema: cache: upgrade varnish to 5.1.3-1wm13, begin rolling varnish-fe restarts [[phab:T249344|T249344]]
* 13:03 marostegui: Deploy schema change on db1118
* 13:03 jbond42: updating gnutls on buster
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for schema change', diff saved to https://phabricator.wikimedia.org/P10906 and previous config saved to /var/cache/conftool/dbconfig/20200406-130320-marostegui.json
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 after schema change', diff saved to https://phabricator.wikimedia.org/P10905 and previous config saved to /var/cache/conftool/dbconfig/20200406-130255-marostegui.json
* 12:59 Urbanecm: Creation of grwikimedia is done ([[phab:T245911|T245911]])
* 12:59 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 22s)
* 12:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 12:54 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 12:53 marostegui: Deploy schema change on db1107
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 for schema change', diff saved to https://phabricator.wikimedia.org/P10904 and previous config saved to /var/cache/conftool/dbconfig/20200406-125308-marostegui.json
* 12:52 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 58s)
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P10903 and previous config saved to /var/cache/conftool/dbconfig/20200406-125222-marostegui.json
* 12:46 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: {{Gerrit|77b9ae9}}: Create grwikimedia
* 12:44 urbanecm@deploy1001: Synchronized dblists/: {{Gerrit|77b9ae9}}: Create grwikimedia (duration: 00m 59s)
* 12:37 XioNoX: Update eqiad analytics filters with new APT IPs
* 12:27 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:21 marostegui: Deploy schema change on db1089
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for schema change', diff saved to https://phabricator.wikimedia.org/P10902 and previous config saved to /var/cache/conftool/dbconfig/20200406-122123-marostegui.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10901 and previous config saved to /var/cache/conftool/dbconfig/20200406-122058-marostegui.json
* 12:14 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:08 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 12:04 godog: test grafana 6.7.2 upgrade on grafana2001 - [[phab:T244208|T244208]]
* 11:57 awight: EU swat complete
* {{safesubst:SAL entry|1=11:53 awight@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/TwoColConflict: SWAT: [[gerrit:586309{{!}}Backport talk page and EventLogging changes (T248243, T249404) (duration: 00m 59s)}}
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 11:48 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 11:48 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:586325{{!}}Create account creator and rollback groups on yowiki (T249487)]] (duration: 00m 59s)
* 11:32 awight@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/ContentTranslation: SWAT: [[gerrit:586311{{!}}Avoid failure on restoring draft with no categories (T249400)]] (duration: 01m 02s)
* 11:25 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: double-syncing (duration: 00m 58s)
* 11:24 marostegui: Deploy schema change on db1105:3311
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10900 and previous config saved to /var/cache/conftool/dbconfig/20200406-112417-marostegui.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10899 and previous config saved to /var/cache/conftool/dbconfig/20200406-112123-marostegui.json
* 11:18 elukey: import AMD ROCm 3.3 packages in buster-wikimedia (component thirdparty/rocm33) - [[phab:T247082|T247082]]
* 11:17 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:580394{{!}}cirrus: Increase commonswiki near match weight (T245642)]] (duration: 00m 59s)
* 11:11 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:585779{{!}} Whitelist X-Wikimedia-Debug header for cross-wiki API requests (T249107)]] (duration: 00m 59s)
* 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:586305{{!}} Bumping portals to master (563985)]] (duration: 00m 58s)
* 10:50 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:586305{{!}} Bumping portals to master (563985)]] (duration: 01m 12s)
* 09:50 XioNoX: push pfw firewall policies - [[phab:T249267|T249267]]
* 09:40 marostegui: Deploy schema change on db1099:3311
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10898 and previous config saved to /var/cache/conftool/dbconfig/20200406-093944-marostegui.json
* 09:11 ema: cp2027: upgrade varnish to 5.1.3-1wm13 and restart varnish-fe [[phab:T249344|T249344]]
* 09:08 ema: upload varnish 5.1.3-1wm13 to buster-wikimedia on apt1001.wm.org [[phab:T249344|T249344]]
* 08:55 ariel@deploy1001: Finished deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link (duration: 00m 09s)
* 08:55 ariel@deploy1001: Started deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link
* 08:54 elukey: bootstrap wdqs200[7,8] - [[phab:T246343|T246343]]
* 08:50 marostegui: Deploy schema change on db1139:3311
* 08:18 _joe_: conversion of codfw api done
* 08:07 marostegui: Deploy schema change on dbstore1003:3311
* 07:54 vgutierrez: rolling restart of ats-tls to disable wmf-analytics log - [[phab:T249335|T249335]] [[phab:T237993|T237993]]
* 07:50 dcausse: search index: deleting stale index wikidatawiki_content_1585224806 on cloudelastic:9243
* 07:49 _joe_: eqiad API migrated to envoy for local TLS termination, now starting codfw
* 07:35 elukey: restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 as attempt to fix heavy GC runs (old gen) - [[phab:T231517|T231517]]
* 07:35 marostegui: Rename wb_terms on eqiad excluding labsdb1009, labdb1010, labsdb1011 - [[phab:T248086|T248086]]
* 07:06 marostegui: Rename wb_terms on codfw - [[phab:T248086|T248086]]
* 06:45 XioNoX: delete BGP to AS25074 in amsix
* 06:36 _joe_: converting the api servers to envoy for TLS in eqiad
* 06:30 marostegui: Upgrade dbproxy1019 - [[phab:T231520|T231520]]
* 06:18 marostegui: Deploy schema change on s1 codfw master, this will generate lag on codfw
* 05:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:50 vgutierrez: ats-tls restart in cp3056, cp3058 and cp3062 - [[phab:T249335|T249335]]
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P10897 and previous config saved to /var/cache/conftool/dbconfig/20200406-054559-marostegui.json
* 05:18 marostegui: Deploy schema change on db1079 (this will generate lag on s7 labs)
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P10896 and previous config saved to /var/cache/conftool/dbconfig/20200406-051744-marostegui.json
* 05:16 vgutierrez: Enable inbound TLSv1.3 in upload@eqiad - [[phab:T170567|T170567]]
* 05:16 vgutierrez: Enable TLS Session Tickets on eqiad - [[phab:T245616|T245616]]
* 05:03 vgutierrez: ats-tls restart in cp1075, cp1081 and cp1087 - [[phab:T249335|T249335]]


== 2020-04-03 ==
== 2023-03-18 ==
* 21:17 andrewbogott: ugpraded wikitech-static to 1.34.1
* 22:47 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
* 17:58 mutante: rsync home dirs from install1002 to apt1001:/srv/home_install1002...
* 22:47 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 15:43 ema: cp3061: restart varnish-fe [[phab:T249344|T249344]]
* 14:26 apergos: rsync of xmldata public dir  from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
* 15:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:46 apergos: rsync of xmldata private dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
* 15:19 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 07:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 15:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 07:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 15:18 ema: cp3057: restart varnish-fe [[phab:T249344|T249344]]
* 02:57 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
* 14:37 hashar: Restarting Jenkins for a CSP parameter [[phab:T245658|T245658]]
* 02:57 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 14:07 vgutierrez: restart ats-tls on cp1087 - [[phab:T249335|T249335]]
* 01:21 urandom: powercycling restbase2025 — [[phab:T332462|T332462]]
* 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10882 and previous config saved to /var/cache/conftool/dbconfig/20200403-140132-marostegui.json
* 00:06 AndyRussG: Updating civicrm from {{Gerrit|5dd37c9c}} to {{Gerrit|3d3606f1}}
* 13:55 vgutierrez: restart ats-tls on cp1075 and cp1081 - [[phab:T249335|T249335]]
* 12:49 marostegui: Deploy schema change on db1090:3317
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10881 and previous config saved to /var/cache/conftool/dbconfig/20200403-124908-marostegui.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P10880 and previous config saved to /var/cache/conftool/dbconfig/20200403-124827-marostegui.json
* 12:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing [[phab:T249196|T249196]] (duration: 00m 43s)
* 12:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing [[phab:T249196|T249196]]
* 12:27 marostegui: Deploy schema change on db1136
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P10879 and previous config saved to /var/cache/conftool/dbconfig/20200403-122716-marostegui.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P10878 and previous config saved to /var/cache/conftool/dbconfig/20200403-122259-marostegui.json
* 12:00 marostegui: Deploy schema change on db1094
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P10877 and previous config saved to /var/cache/conftool/dbconfig/20200403-115959-marostegui.json
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20200403-115854-marostegui.json
* 11:40 marostegui: Deploy schema change on db1098:3317
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10875 and previous config saved to /var/cache/conftool/dbconfig/20200403-114004-marostegui.json
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10874 and previous config saved to /var/cache/conftool/dbconfig/20200403-113717-marostegui.json
* 10:38 marostegui: Deploy schema change on db1101:3317
* 10:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|861b267}}: Enable cswiki anniversary logo ([[phab:T249173|T249173]]) (duration: 01m 02s)
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10872 and previous config saved to /var/cache/conftool/dbconfig/20200403-103746-marostegui.json
* 09:32 marostegui: Deploy schema on db1116:3317
* 08:43 marostegui: Deploy schema change on dbstore1003:3317
* 07:57 marostegui: Deploy schema change on s7 codfw master, this will generate lag on codfw
* 06:55 XioNoX: add fastnetmon 1.1.4 to buster-wikimedia - [[phab:T240658|T240658]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 after schema change', diff saved to https://phabricator.wikimedia.org/P10870 and previous config saved to /var/cache/conftool/dbconfig/20200403-062529-marostegui.json
* 05:21 marostegui: Deploy schema change on db1126
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P10869 and previous config saved to /var/cache/conftool/dbconfig/20200403-052115-marostegui.json
* 00:42 catrope@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/FlaggedRevs/: Fix logic for determining if pending edits were null ([[phab:T249277|T249277]]) (duration: 01m 00s)


== 2020-04-02 ==
== 2023-03-17 ==
* 23:53 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
* 19:53 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching (duration: 00m 13s)
* 23:09 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't try to grant 'oathauth-enable' to '*' (part 2) ([[phab:T248282|T248282]]) (duration: 00m 58s)
* 19:53 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching
* 19:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Translate/specials/SpecialExportTranslations.php: [[phab:T249258|T249258]]: Revert 'Special:ExportTranslations: Disallow exporting huge groups' (duration: 00m 59s)
* 19:52 bd808: Testing Mastodon account changes. This should post to @wikimedia_sal@botsin.space
* 19:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]] (duration: 15m 13s)
* 19:06 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch (duration: 00m 13s)
* 19:35 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/MovePage.php: [[phab:T248789|T248789]] MovePage: Use correct Title when creating the null revision (duration: 00m 59s)
* 19:06 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch
* 19:30 hashar: docker-pkg update on contint hosts
* 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
* 19:30 hashar@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 12s)
* 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
* 19:29 hashar@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
* 19:23 ppchelko@deploy1001: Started deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]]
* 18:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
* 19:05 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
* 19:00 longma: promoting all to 1.35.0-wmf.26
* 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
* 18:39 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]] (duration: 01m 05s)
* 18:10 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
* 18:38 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 18:09 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 18:37 longma: rolling group1 to 1.35.0-wmf.26
* 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
* 18:27 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/MobileFrontend/: SWAT: {{Gerrit|4e2a092}}: EditorGateway: Fix handling of null sectionId ([[phab:T249169|T249169]]) (duration: 01m 09s)
* 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
* 18:22 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/VisualEditor/modules/ve-mw: SWAT: {{Gerrit|94ded03}}: Fix issues with treating section "numbers" as integers ([[phab:T248795|T248795]]; [[phab:T248968|T248968]]; [[phab:T249112|T249112]]) (duration: 01m 10s)
* 17:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
* 17:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}} (duration: 03m 21s)
* 17:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
* 17:45 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}}
* 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5004.eqsin.wmnet
* 16:53 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8] (duration: 00m 08s)
* 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs5004.eqsin.wmnet
* 16:53 joal@deploy1001: Started deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8]
* 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:49 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/actions/Action.php: [[phab:T249162|T249162]] Partially revert 'WikiPage/Article split. Rely on Article inside Action' (duration: 01m 07s)
* 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:44 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8] (duration: 13m 50s)
* 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
* 16:37 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
* 16:34 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 05s)
* 15:29 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 16:33 jforrester@deploy1001: sync-file aborted: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 00m 00s)
* 15:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 01m 07s)
* 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 16:30 joal@deploy1001: Started deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8]
* 14:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 16:19 XioNoX: upgrade netflow4001's fastnetmon to 1.1.4 - [[phab:T240658|T240658]]
* 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 14:56 XioNoX: push new test switch config for cloudvirt2001 - [[phab:T248425|T248425]]
* 14:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 14:33 vgutierrez: Enable inbound TLSv1.3 in upload@codfw - [[phab:T170567|T170567]]
* 14:54 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 14:33 vgutierrez: Enable TLS Session tickets in codfw - [[phab:T245616|T245616]]
* 14:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:24 jbond42: updating bluez on ganeti and cloudvirt
* 14:13 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10865 and previous config saved to /var/cache/conftool/dbconfig/20200402-142338-marostegui.json
* 14:05 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10864 and previous config saved to /var/cache/conftool/dbconfig/20200402-141802-marostegui.json
* 13:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10863 and previous config saved to /var/cache/conftool/dbconfig/20200402-141335-marostegui.json
* 13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10862 and previous config saved to /var/cache/conftool/dbconfig/20200402-141149-marostegui.json
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:50 marostegui: Compress wbqc_constraints on testcommonswiki and commonswiki (empty tables) - [[phab:T248967|T248967]]
* 13:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 13:44 vgutierrez: update puppet compiler facts
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:40 marostegui: Deploy schema change on db1111
* 13:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P10861 and previous config saved to /var/cache/conftool/dbconfig/20200402-133956-marostegui.json
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:32 gehel: OSM data reimport on maps2004 - [[phab:T249086|T249086]]
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 12:55 mutante: mw1390 - mw1399 - pooled and active but status "staged" in netbox, fixing to 'active'
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 12:52 mutante: mw1297 - is pooled and serving traffic but status "staged" in netbox. set to "active"
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after schema change', diff saved to https://phabricator.wikimedia.org/P10858 and previous config saved to /var/cache/conftool/dbconfig/20200402-114020-marostegui.json
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 11:06 mutante: decom planet1001 ([[phab:T248863|T248863]])
* 13:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2004.codfw.wmnet
* 10:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:21 claime: Depooling parse2004.codfw.wmnet for broken PSU - [[phab:T332119|T332119]]
* 10:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 12:06 mutante: systemct-reset failed on gitlab-runner*
* 10:19 marostegui: Deploy schema change on db1087, this will generate lag on s8 on wiki replicas
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P10857 and previous config saved to /var/cache/conftool/dbconfig/20200402-101920-marostegui.json
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 10:17 elukey: set up TLS encryption for all pmacct instances on netflow* to Kafka Jumbo
* 11:03 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P10856 and previous config saved to /var/cache/conftool/dbconfig/20200402-101747-marostegui.json
* 11:02 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 09:47 marostegui: Remove haproxy@10.64.37.14 from labsdb hosts - [[phab:T231280|T231280]] [[phab:T248944|T248944]]
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:44 gehel: CORRECTION: depool maps2004 for data reimport - [[phab:T249086|T249086]]
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:40 gehel: depool wdqs2004 for data reimport - [[phab:T249086|T249086]]
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:33 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 18s)
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:32 oblivian@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:28 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@4f86d77]: (no justification provided) (duration: 00m 09s)
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 09:28 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4f86d77]: (no justification provided)
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:51 marostegui: Deploy schema change db1104
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for schema change', diff saved to https://phabricator.wikimedia.org/P10854 and previous config saved to /var/cache/conftool/dbconfig/20200402-085057-marostegui.json
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl', diff saved to https://phabricator.wikimedia.org/P45887 and previous config saved to /var/cache/conftool/dbconfig/20230317-055643-marostegui.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P10853 and previous config saved to /var/cache/conftool/dbconfig/20200402-085019-marostegui.json
* 02:10 ejegg: civicrm upgraded from {{Gerrit|672950d9}} to {{Gerrit|5dd37c9c}}
* 08:28 gehel: repooling wdqs1006 - catched up on lag
* 01:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2010.codfw.wmnet
* 08:22 vgutierrez: Enable inbound TLSv1.3 in upload@esams - [[phab:T170567|T170567]]
* 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2010.codfw.wmnet
* 08:21 vgutierrez: Enable TLS Session tickets in esams - [[phab:T245616|T245616]]
* 00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
* 07:45 moritzm: bounced ferm on ms-be1040
* 00:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
* 07:27 marostegui: Deploy schema change on db1092
* 00:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P10850 and previous config saved to /var/cache/conftool/dbconfig/20200402-072730-marostegui.json
* 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10849 and previous config saved to /var/cache/conftool/dbconfig/20200402-072500-marostegui.json
* 00:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates
* 05:49 marostegui: Deploy schema change on db1101:3318
* 00:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10848 and previous config saved to /var/cache/conftool/dbconfig/20200402-054931-marostegui.json
* 05:29 elukey: powercycle analytics1045 (host not responsive to ssh, weird chars showed in mgmt serial console)


== 2020-04-01 ==
== 2023-03-16 ==
* 22:44 volker-e@deploy1001: Finished deploy [design/style-guide@4bfe647]: Deploy design/style-guide:  (duration: 00m 08s)
* 23:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
* 22:43 volker-e@deploy1001: Started deploy [design/style-guide@4bfe647]: Deploy design/style-guide:
* 23:40 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
* 22:02 volans: forcing logrotate on netflow2001 to compress yesterday's logs
* 23:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
* 21:53 volans: force-rebooting ms-be1023, unresponsive - [[phab:T249174|T249174]]
* 23:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
* 21:50 volans: stopped and restarted kafkatee-webrequest.service on netflow2001, was in a restart loop
* 23:31 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb2003.codfw.wmnet with OS bullseye
* 19:48 marxarelli: rollback of 1.35.0-wmf.26 from group1 ([[phab:T247773|T247773]]). blocked by [[phab:T249162|T249162]]
* 23:28 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb1003.eqiad.wmnet with OS bullseye
* 19:30 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rollback 1.35.0-wmf.26 from group1
* 23:20 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0 (duration: 00m 19s)
* 19:21 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26 (duration: 01m 06s)
* 23:20 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0
* 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26
* 23:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
* 19:18 marxarelli: promoting group1 to 1.35.0-wmf.26 to group1
* 23:15 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
* 17:21 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqord*' commit 'enable sampling on eqord Iac15379cc'
* 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
* 16:54 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqdfw*' commit 'enable sampling on eqdfw Iac15379cc'
* 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
* 16:39 vgutierrez: pool cp2027 - [[phab:T248816|T248816]]
* 23:01 dzahn@cumin1001: START - Cookbook sre.ganeti.reimage for host miscweb1003.eqiad.wmnet with OS bullseye
* 16:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:00 dzahn@cumin2002: START - Cookbook sre.ganeti.reimage for host miscweb2003.codfw.wmnet with OS bullseye
* 16:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb1003.eqiad.wmnet
* 16:17 ariel@deploy1001: Finished deploy [dumps/dumps@21363c1]: page range prefetch fixup (duration: 00m 09s)
* 22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb2003.codfw.wmnet
* 16:17 ariel@deploy1001: Started deploy [dumps/dumps@21363c1]: page range prefetch fixup
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb1003.eqiad.wmnet on all recursors
* 15:33 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:39 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache miscweb1003.eqiad.wmnet on all recursors
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:31 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
* 15:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 15:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host miscweb1003.eqiad.wmnet
* 15:27 vgutierrez: depool & decommission cp20[16,19,23,27] - [[phab:T249125|T249125]]
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb2003.codfw.wmnet on all recursors
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10845 and previous config saved to /var/cache/conftool/dbconfig/20200401-152258-marostegui.json
* 22:32 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache miscweb2003.codfw.wmnet on all recursors
* 15:11 herron: performing kafka-main rolling restarts to pick up security updates
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 22:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
* 14:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:29 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 14:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 22:29 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host miscweb2003.codfw.wmnet
* 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:24 ejegg: civicrm upgraded from {{Gerrit|68fa85cf}} to {{Gerrit|672950d9}}
* 14:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:43 vgutierrez: depool && decommission cp[2018,2020,2022,2024-2026].codfw.wmnet - [[phab:T249115|T249115]]
* 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:32 gehel: depooling wdqs1006 to allow catching up on lag
* 22:04 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:30 vgutierrez: pool cp2042 - [[phab:T248816|T248816]]
* 21:54 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:47 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 14:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:36 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): blockers hopefully resolved, rolling to all wikis
* 14:09 XioNoX: remove AS-path prepending in esams
* 20:35 TheresNoTime: close UTC late backport window
* 13:47 XioNoX: remove AS-path prepending in eqsin
* 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] (duration: 08m 18s)
* 13:39 vgutierrez: pool cp2041 - [[phab:T248816|T248816]]
* 20:28 samtar@deploy2002: samtar and sharvaniharan: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 13:34 mutante: sodium (mirror): sudo -u mirror ftpsync to get Debian mirror updated (Icinga says it's old)
* 20:26 samtar@deploy2002: Started scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]]
* 13:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:21 brennen@deploy2002: Finished scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] (duration: 09m 06s)
* 13:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:14 brennen@deploy2002: brennen and jforrester: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:17 marostegui: Deploy schema change on db1099:3318
* 20:12 brennen@deploy2002: Started scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]]
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10843 and previous config saved to /var/cache/conftool/dbconfig/20200401-131719-marostegui.json
* 19:28 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a587106]: (no justification provided) (duration: 00m 12s)
* 13:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:27 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a587106]: (no justification provided)
* 13:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:41 wfan: enable monthlyconvert for cz
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:40 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) (duration: 00m 13s)
* 12:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 18:40 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided)
* 12:19 tgr@deploy1001: Synchronized wmf-config/config: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 06s)
* 18:38 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2067.codfw.wmnet
* 12:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:37 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 12:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet
* 12:17 tgr@deploy1001: Synchronized dblists/growthexperiments.dblist: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 05s)
* 18:03 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet
* 12:17 XioNoX: restart nfacct on netflow4001 for kafka tls tests - [[phab:T248980|T248980]]
* 17:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
* 12:15 vgutierrez: depool & decommission cp2013 - [[phab:T249088|T249088]]
* 17:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
* 12:14 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 06s)
* 17:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 12:12 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585059{{!}}Enable password-reset-update on all other than Wikipedias (T245791)]] (duration: 01m 07s)
* 17:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
* 12:09 marostegui: Deploy schema change on db1116:3318
* 17:40 ayounsi@cumin2002: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
* 12:05 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons take 2 (duration: 01m 08s)
* 17:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 12:04 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons (duration: 01m 05s)
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]; take II) (duration: 01m 05s)
* 17:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]) (duration: 01m 07s)
* 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 11:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable WikibaseQualityConstraints on Commons take II (duration: 01m 06s)
* 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 11:44 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable WikibaseQualityConstraints on Commons (duration: 01m 18s)
* 16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s)
* 11:20 cormacparle__: created table wbqc_constraints on commonswiki
* 16:58 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade.
* 11:03 jbond42: install bluez update on ganeti-canary and cloudvirt/cloudcontrol-dev
* 16:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
* 11:01 mutante: planet1001 - reinstall OS to test install_server switch, ATS switched to planet1002 earlier
* 16:56 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
* 10:47 marostegui: Deploy schema change on dbstore1005:3318
* 16:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
* 10:25 vgutierrez: pool cp2040 - [[phab:T248816|T248816]]
* 16:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
* 10:16 oblivian@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: service=canary
* 16:31 Emperor: reboot ms-be2067 again to see if the missing drive comes back
* 09:55 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
* 09:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:39 claime: Pooled new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]]
* 09:37 marostegui: Deploy schema change on s8 codfw, this will generate lag on codfw
* 15:28 sukhe: enable puppet on R:class = dnsrecursor to merge CR: 898957 [done]
* 09:35 XioNoX: Update install servers IPs (dhcp helpers + firewall rules) - [[phab:T224576|T224576]]
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
* 09:34 mutante: install_servers: DHCP_relay in routers and TFTP server in DHCP server config have been switched from install1002/2002 to install1003/2003 - doing a test install, but if any issues report on [[phab:T224576|T224576]]
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
* 09:26 marostegui: last entry was for db2093
* 15:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
* 09:26 marostegui: Downgrade mariadb package from 10.4.12-2 to 10.4.12-1
* 15:15 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
* 09:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:15 claime: Pooling new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]]
* 09:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
* 09:05 mutante: planet - the backend server has been switched from planet1001 (stretch) to planet1002 (buster) - [[phab:T247651|T247651]]
* 15:12 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
* 08:46 mutante: deneb, boron: systemctl reset-failed to clear up systemd state alerts
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
* 08:43 marostegui: Stop haproxy on dbproxy1010 [[phab:T248944|T248944]]
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
* 08:37 jynus: restart bacula at backup1001
* 15:10 sukhe: disable puppet on R:class = dnsrecursor to merge CR: 898957
* 08:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 32 hosts
* 08:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 08:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:49 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 08:28 vgutierrez: depool & decommission cp2017 - [[phab:T249084|T249084]]
* 14:44 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 08:21 vgutierrez: pool cp2039 - [[phab:T248816|T248816]]
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:09 marostegui: Deploy schema change on db1138 (s4 primary master)
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 after schema change', diff saved to https://phabricator.wikimedia.org/P10841 and previous config saved to /var/cache/conftool/dbconfig/20200401-071339-marostegui.json
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:12 vgutierrez: pool cp2038 - [[phab:T248816|T248816]]
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 06:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:06 urandom: ALTER-ing image_suggestions.suggestion table — [[phab:T328670|T328670]]
* 06:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 13:35 kostajh: UTC afternoon deploys done
* 06:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:34 kharlan@deploy2002: Finished scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] (duration: 07m 44s)
* 06:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:28 kharlan@deploy2002: kharlan: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 06:36 vgutierrez: depool & decommission cp2012 - [[phab:T249080|T249080]]
* 13:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]]
* 06:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:15 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] (duration: 09m 48s)
* 06:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:07 kharlan@deploy2002: kharlan: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 05:39 marostegui: Deploy schema change on db1121 (this will create lag on s4 labs)
* 13:05 kharlan@deploy2002: Started scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]]
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for schema change', diff saved to https://phabricator.wikimedia.org/P10840 and previous config saved to /var/cache/conftool/dbconfig/20200401-053827-marostegui.json
* 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
* 00:39 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Update http and prot rel links to https, fix link to sitelist in MW Core (duration: 01m 06s)
* 12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
* 00:12 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Add export-0.11 (duration: 01m 05s)
* 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 12:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
* 11:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
* 11:43 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
* 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
* 11:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
* 11:27 hnowlan@puppetmaster1001: conftool action : set/weight=3; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 32 hosts with reason: new_install
* 11:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 32 hosts with reason: new_install
* 11:10 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
* 11:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
* 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
* 10:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:38 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
* 10:33 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 10:32 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 10:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
* 10:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:30 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:28 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 to move it to x1', diff saved to https://phabricator.wikimedia.org/P45885 and previous config saved to /var/cache/conftool/dbconfig/20230316-100945-root.json
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1105.eqiad.wmnet
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:49 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:48 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1105.eqiad.wmnet
* 08:40 kostajh: UTC morning deploys (second round) done
* 08:40 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] (duration: 12m 30s)
* 08:29 kharlan@deploy2002: kharlan: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]]
* 08:11 apergos: additional deployments for the  UTC morning backport and config training window, running into the next hour, so window re-opened
* 07:36 tgr_: UTC morning deploys done
* 07:34 tgr@deploy2002: Finished scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] (duration: 08m 13s)
* 07:28 tgr@deploy2002: tgr: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:26 tgr@deploy2002: Started scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]]
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105 from dbctl [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45883 and previous config saved to /var/cache/conftool/dbconfig/20230316-062307-root.json
* 06:03 marostegui: Failover m5 from db1106 to db1176 - [[phab:T332155|T332155]]
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]]
* 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]]
* 03:29 ejegg: payments-wiki upgraded from {{Gerrit|1532b107}} to {{Gerrit|0fd66b1f}}


== 2020-03-31 ==
== 2023-03-15 ==
* 22:23 marxarelli: group0 to 1.35.0-wmf.26 ([[phab:T247773|T247773]]); no rise in error rates following redeployment
* 22:55 tzatziki: Removing 1 file for legal compliance
* 22:13 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.26
* 22:30 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 55s)
* 22:07 dduvall@deploy1001: rebuilt and synchronized wikiversions files: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 22:29 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]])
* 21:54 dduvall@deploy1001: sync aborted: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]]) (duration: 07m 31s)
* 22:29 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 28s)
* 21:47 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 22:28 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]])
* 21:46 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/user/UserNameUtils.php: [[phab:T249045|T249045]] Use wfMessage in UserNameUtils::isUsable for now (duration: 00m 58s)
* 22:08 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str (duration: 00m 14s)
* 21:05 eileen: process-control config revision is {{Gerrit|f80d248113}} - (catch up dedupe now off - fyi MBeat )
* 22:07 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str
* 20:59 hashar: contint1001: manually reverted /lib/systemd/system/jenkins.service
* 21:59 brennen: end of phabricator update window ([[phab:T331915|T331915]])
* 20:51 hashar: Restarting Jenkins for new CSP rules # [[phab:T245658|T245658]]
* 21:47 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 40s)
* 20:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rolling back 1.35.0-wmf.26 testwiki deployment following significant increase in error rate (cc [[phab:T247773|T247773]])
* 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]])
* 20:14 marxarelli: correction: RequestContext::getLanguage errors are for testwiki deployment, pre group0
* 21:46 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 28s)
* 20:08 marxarelli: a slew of "ErrorException from line 334 of /srv/mediawiki/php-1.35.0-wmf.26/includes/context/RequestContext.php: PHP Warning: Recursion detected in RequestContext::getLanguage" after group0 deployment (cc [[phab:T247773|T247773]])
* 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]])
* 20:04 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache (duration: 142m 48s)
* 21:26 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]]) (duration: 00m 52s)
* 19:20 ariel@deploy1001: Finished deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly (duration: 00m 04s)
* 21:25 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]])
* 19:20 ariel@deploy1001: Started deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly
* 21:19 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893] (duration: 00m 11s)
* 18:08 ariel@deploy1001: Finished deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date (duration: 00m 05s)
* 21:19 milimetric@deploy2002: Started deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893]
* 18:07 ariel@deploy1001: Started deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date
* 21:13 mutante: phab* - upgrading PHP packages
* 17:42 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache
* 21:13 mutante: phabricator - maintenance window starting - expect possible downtime
* 17:40 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.23 (duration: 26m 51s)
* 21:08 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
* 17:38 elukey: restart elasticsearch_6@cloudelastic-chi-eqiad.service on cloudelastic1001 to see if it recovers from a trashing/gc state - [[phab:T231517|T231517]]
* 21:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
* 16:30 marxarelli: 1.35.0-wmf.26 was branched at {{Gerrit|bec758b668aaa57fc259a1d0ecf3b35340d2661b}} for [[phab:T247773|T247773]]
* 20:56 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]]) (duration: 00m 31s)
* 16:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 00s)
* 20:55 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]])
* 16:15 vgutierrez: pool cp2037 - [[phab:T248816|T248816]]
* 20:54 brennen: starting phabricator window a touch early with a test deploy to phab2002
* 15:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:51 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor (duration: 00m 16s)
* 15:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:51 ebernhardson@deploy2002: Started deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor
* 15:35 mutante: decom mw1254 through mw1258 (last remaining old servers in rack D5, depooled a while ago and average response time is again under 200ms) [[phab:T247780|T247780]]
* 20:48 TheresNoTime: close UTC late backport window
* 15:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:48 samtar@deploy2002: Finished scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] (duration: 08m 46s)
* 15:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:41 samtar@deploy2002: matmarex and samtar and esanders: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 15:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:39 samtar@deploy2002: Started scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]]
* 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] (duration: 10m 30s)
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 20:33 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3002.wikimedia.org with OS bullseye
* 15:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:27 samtar@deploy2002: samtar and tsepothoabala: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:25 samtar@deploy2002: Started scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]]
* 15:26 vgutierrez: depool & decommission cp2010 - [[phab:T249002|T249002]]
* 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] (duration: 10m 12s)
* 15:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 20:20 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bullseye
* 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245794|T245794]] Enable DiscussionTools as a beta feature on four wikis (duration: 01m 00s)
* 20:17 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bullseye
* 15:05 cdanis: cr1-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 20:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
* 15:01 cdanis: cr2-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye
* 14:43 vgutierrez: pool cp2036 - [[phab:T248816|T248816]]
* 20:15 samtar@deploy2002: sgimeno and samtar: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw125[4-8].eqiad.wmnet
* 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]]
* 14:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
* 14:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 14s)
* 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:11 taavi: deploy patch for [[phab:T331192|T331192]]
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1091 after schema change', diff saved to https://phabricator.wikimedia.org/P10834 and previous config saved to /var/cache/conftool/dbconfig/20200331-141459-marostegui.json
* 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
* 14:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
* 14:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
* 14:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
* 19:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
* 13:31 vgutierrez: Enable TLS Session tickets in eqsin - [[phab:T245616|T245616]]
* 19:54 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3002.wikimedia.org with OS bullseye
* 13:05 XioNoX: update nat on pfw3-codfw - [[phab:T248906|T248906]]
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
* 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
* 12:49 _joe_: switching all appserver canaries to envoy
* 19:53 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3001.wikimedia.org with OS bullseye
* 12:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:50 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
* 12:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 19:49 taavi@deploy2002: Finished scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] (duration: 12m 04s)
* 12:45 marostegui: Deploy schema change on db1091
* 19:48 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1002.wikimedia.org with OS bullseye
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 for schema change', diff saved to https://phabricator.wikimedia.org/P10833 and previous config saved to /var/cache/conftool/dbconfig/20200331-124452-marostegui.json
* 19:47 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
* 12:34 _joe_: transitioning mw1261 to envoy
* 19:46 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bullseye
* 12:23 vgutierrez: rolling upgrade of ATS to version 8.0.6-1wm5 - [[phab:T248938|T248938]]
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
* 11:30 Lucas_WMDE: EU SWAT done
* 19:45 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2002.wikimedia.org with OS bullseye
* 11:30 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]], take II (duration: 00m 57s)
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 11:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]] (duration: 00m 58s)
* 19:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bullseye
* 11:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]], take II (duration: 00m 59s)
* 19:41 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bullseye
* 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]] (duration: 01m 00s)
* 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
* 10:46 _joe_: disabled puppet on canary appservers, potentially dangerous change ahead
* 19:39 taavi@deploy2002: taavi: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084 after schema change', diff saved to https://phabricator.wikimedia.org/P10831 and previous config saved to /var/cache/conftool/dbconfig/20200331-101953-marostegui.json
* 19:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 10:03 XioNoX: add BGP to AS41327 in AMS-IX
* 19:37 taavi@deploy2002: Started scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]]
* 09:49 XioNoX: push homer diffs to mr1-eqsin
* 19:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
* 09:36 XioNoX: push homer diffs to mr1-eqiad
* 19:35 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
* 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
* 09:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
* 09:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:32 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye
* 09:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 19:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
* 09:05 vgutierrez: upload trafficserver 8.0.5-1wm6 to apt.wm.o (buster) - [[phab:T248938|T248938]]
* 19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
* 09:00 vgutierrez: depool & decommission cp2011 - [[phab:T248950|T248950]]
* 19:28 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
* 08:44 vgutierrez: pool cp2035 - [[phab:T248816|T248816]]
* 19:27 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 08:31 mutante: signed puppet cert for planet1002.eqiad.wmnet
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
* 08:29 marostegui: Depool db1084 for schema change
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for schema change', diff saved to https://phabricator.wikimedia.org/P10829 and previous config saved to /var/cache/conftool/dbconfig/20200331-082904-marostegui.json
* 19:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1081 after schema change', diff saved to https://phabricator.wikimedia.org/P10828 and previous config saved to /var/cache/conftool/dbconfig/20200331-082711-marostegui.json
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
* 08:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
* 08:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1001.wikimedia.org with OS bullseye
* 08:01 XioNoX: delete unused ROA for ARIN v4 prefixes - [[phab:T235886|T235886]]
* 19:16 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2001.wikimedia.org with OS bullseye
* 07:49 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bullseye
* 07:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 19:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3001.wikimedia.org with OS bullseye
* 07:17 vgutierrez: pool cp2034 - [[phab:T248816|T248816]]
* 19:05 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6002.wikimedia.org with OS bullseye
* 07:16 marostegui: Deploy schema change on db1081
* 19:03 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bullseye
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 for schema change', diff saved to https://phabricator.wikimedia.org/P10827 and previous config saved to /var/cache/conftool/dbconfig/20200331-071547-marostegui.json
* 18:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10826 and previous config saved to /var/cache/conftool/dbconfig/20200331-071401-marostegui.json
* 18:49 mutante: adding new language prefix anp.wikipedia.org - Angika, an Eastern Indo-Aryan language spoken in some parts of the Indian states of Bihar and Jharkhand, as well as in parts of Nepal. ([[phab:T332115|T332115]])
* 06:48 marostegui: Deploy schema change on db1103:3314
* 18:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10825 and previous config saved to /var/cache/conftool/dbconfig/20200331-064707-marostegui.json
* 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10824 and previous config saved to /var/cache/conftool/dbconfig/20200331-064627-marostegui.json
* 18:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
* 05:55 marostegui: Drop nova and nova_api from m5 master (db1133) - [[phab:T248313|T248313]]
* 18:25 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6001.wikimedia.org with OS bullseye
* 05:55 kart_: Updated cxserver to 2020-03-30-145349-production ([[phab:T248578|T248578]])
* 18:24 brennen@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]] (duration: 06m 08s)
* 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 05:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 18:19 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5002.wikimedia.org with OS bullseye
* 05:53 vgutierrez: depool && decommission cp2007 - [[phab:T248941|T248941]]
* 18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 05:48 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 18:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 05s)
* 05:46 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries
* 05:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:06 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): no current blockers, rolling to group1.
* 05:46 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 18:04 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5001.wikimedia.org with OS bullseye
* 05:26 marostegui: Deploy schema change on db1097:3314
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10822 and previous config saved to /var/cache/conftool/dbconfig/20200331-051354-marostegui.json
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 00:26 eileen: civicrm revision changed from {{Gerrit|cf2e2c11c3}} to {{Gerrit|524b162174}}, config revision is {{Gerrit|708198a154}}
* 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
* 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
* 17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
* 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.wmnet
* 17:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2006.codfw.wmnet
* 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bullseye
* 17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2006.codfw.wmnet
* 17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2004.codfw.wmnet
* 17:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2004.codfw.wmnet
* 17:29 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
* 17:25 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
* 17:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
* 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
* 17:12 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5001.wikimedia.org with OS bullseye
* 17:05 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4001.wikimedia.org with OS bullseye
* 16:19 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 16:19 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 16:17 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 16:17 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 16:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye
* 16:02 hnowlan: restarted thumbor-instances on thumbor1006
* 16:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 15:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 15:52 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
* 15:49 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
* 15:44 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4002.wikimedia.org with OS bullseye
* 15:34 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye
* 15:33 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 15:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 15:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 15:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:01 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 14:54 Emperor: depool moss-fe1001 as rate of token denial is too high
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 14:53 claime: Redeploying mw-on-k8s for php7.4 update [[phab:T330270|T330270]]
* 14:52 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 14:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:46 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:41 cgoubert@deploy2002: Started scap: (no justification provided)
* 14:41 claime: Rebuilding mw-on-k8s images - [[phab:T330270|T330270]]
* 14:38 claime: Updating php7.4 production images
* 14:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:34 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
* 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
* 14:24 daniel@deploy2002: Finished scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] (duration: 09m 57s)
* 14:22 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 14:22 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 14:22 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=pki
* 14:22 jbond: switch pki to be active active
* 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 14:20 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 14:19 jbond: update pki to use discovery record
* 14:16 jbond@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=pki
* 14:15 daniel@deploy2002: daniel: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:14 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4002.wikimedia.org with OS bullseye
* 14:14 daniel@deploy2002: Started scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]]
* 14:12 sukhe: [correction] depool _doh4002_ for reimaging to bullseye: [[phab:T321309|T321309]]
* 14:12 sukhe: depool dns4002 for reimaging to bullseye: [[phab:T321309|T321309]]
* 14:00 moritzm: nodejs security updates on buster
* 13:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS bullseye
* 13:50 sukhe: reprepro -C component/pdns-recursor include bullseye-wikimedia pdns-recursor_4.6.2-1+wmf11u1_amd64.changes: [[phab:T321309|T321309]]
* 13:49 moritzm: installing graphite-web security updates
* 13:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
* 13:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 13:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:28 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 13:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:27 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 13:27 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 13:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
* 13:26 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:24 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 13:21 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:20 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:17 taavi@deploy2002: Finished scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] (duration: 09m 01s)
* 13:12 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS bullseye
* 13:10 taavi@deploy2002: matmarex and taavi and esanders: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebu
* 13:08 taavi@deploy2002: Started scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]]
* 13:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 13:07 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:18 marostegui: Failover m5 from db1176 to db1106 - [[phab:T331877|T331877]]
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]]
* 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]]
* 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 11:36 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 11:34 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 11:32 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 11:30 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
* 11:27 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 11:26 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
* 11:20 moritzm: imported packages into thirdparty/ceph-quincy
* 11:16 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
* 11:14 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
* 11:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
* 11:00 claime: Redirecting test.wikidata.org to mw-on-k8s - [[phab:T331268|T331268]]/25
* 10:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 10:29 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 10:28 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 10:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 10:25 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 10:24 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 10:23 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 10:22 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 10:22 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:21 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:20 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:19 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 10:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 10:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 10:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 10:08 jayme@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 10:08 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 09:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 09:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 09:49 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 09:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
* 09:45 jayme@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
* 09:39 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
* 09:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 09:26 moritzm: rolling restart of FPM/Apache to pick up gnutls28 security updates
* 09:22 moritzm: installing gnutls28 security updates
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl [[phab:T331875|T331875]]', diff saved to https://phabricator.wikimedia.org/P45872 and previous config saved to /var/cache/conftool/dbconfig/20230315-090515-root.json
* 08:40 hashar@deploy2002: Finished deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]] (duration: 00m 19s)
* 08:40 hashar@deploy2002: Started deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]]
* 08:15 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 08:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 07:40 tgr_: UTC morning deploys done
* 07:39 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2067.codfw.wmnet
* 07:36 tgr@deploy2002: Finished scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] (duration: 07m 54s)
* 07:30 tgr@deploy2002: tgr: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:28 tgr@deploy2002: Started scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]]
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45870 and previous config saved to /var/cache/conftool/dbconfig/20230315-062643-root.json
* 06:20 marostegui: Remove pki2001 from m1 grants [[phab:T332018|T332018]]


== 2020-03-30 ==
== 2023-03-14 ==
* 23:30 cdanis: cr3-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 23:29 brennen@deploy2002: Finished scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] (duration: 10m 32s)
* 23:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 23:20 brennen@deploy2002: brennen and umherirrender: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 23:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Alphabetize wikis in each GrowthExperiments settings (duration: 00m 58s)
* 23:19 brennen@deploy2002: Started scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]]
* 23:16 cdanis: cr2-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 22:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 23:08 cdanis: cdanis@cr3-knams# commit comment "sensible flow table sizes [[phab:T248394|T248394]]"
* 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 22:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Provide wmgSiteLogoIcon (duration: 00m 57s)
* 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 22:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgSiteLogoIcon for each project family and four special wikis (duration: 00m 58s)
* 22:08 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 22:50 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Set wgMobileFrontendLogo from wgLogos['icon'] if set (duration: 00m 59s)
* 21:38 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 22:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 57s)
* 21:38 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 22:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Split wgLogos setting into wmgSiteLogo1x etc. (duration: 00m 59s)
* 21:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 22:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Construct wgLogos in CommonSettings so that projects can inherit values (duration: 01m 02s)
* 21:17 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 19:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:16 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 19:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 21:11 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 15:36 ejegg: updated payments listener (standalone SmashPig) from {{Gerrit|dc0c6b208b}} to {{Gerrit|d80e4c5abd}}
* 21:11 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 15:32 vgutierrez: pool cp2033 - [[phab:T248816|T248816]]
* 21:11 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 15:25 jeh: add icinga 2h downtime and soft reset iDRAC on labstore1005.mgmt.eqiad.wmnet [[phab:T247965|T247965]]
* 20:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 14:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:47 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 14:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:43 ejegg: payments-wiki upgraded from {{Gerrit|61c30a4f}} to {{Gerrit|1532b107}}
* 14:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 20:35 zabe@deploy2002: Finished scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] (duration: 08m 36s)
* 14:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 20:28 zabe@deploy2002: zabe: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:53 vgutierrez: depool & decommission cp2008 - [[phab:T248864|T248864]]
* 20:27 zabe@deploy2002: Started scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]]
* 14:23 vgutierrez: pool cp2032 - [[phab:T248816|T248816]]
* 20:04 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 14:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 19:47 topranks: Reboot cloudsw1-b1-codfw to upgrade JunOS version [[phab:T327919|T327919]]
* 14:01 vgutierrez: depool & decommission cp2006 - [[phab:T248856|T248856]]
* 19:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
* 13:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
* 13:45 vgutierrez: pool cp2031 - [[phab:T248816|T248816]]
* 19:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:30 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): uneventful at group0. i'm afk for about an hour.
* 13:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 19:13 ejegg: civicrm upgraded from {{Gerrit|dbe3b716}} to {{Gerrit|68fa85cf}}
* 13:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS bullseye
* 13:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:28 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s)
* 12:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 18:27 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 12:53 vgutierrez: depool & decommission cp2005 - [[phab:T248848|T248848]]
* 18:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
* 12:26 cdanis: cdanis@re0.cr2-codfw# set chassis fpc 5 inline-services flex-flow-sizing    cdanis@re0.cr2-codfw# commit comment "flex-flow-sizing [[phab:T248394|T248394]]"
* 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 12:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 12:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 12:21 vgutierrez: depool & decommission cp2004 - [[phab:T248824|T248824]]
* 18:22 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 30s)
* 12:03 XioNoX: delete unused ROA for ARIN v6 prefixes - [[phab:T235886|T235886]]
* 18:22 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 11:59 XioNoX: delete unused ROAs for RIPE prefixes - [[phab:T235886|T235886]]
* 18:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 11:42 mutante: miscweb2002 - race condition with apache2 mpm and php7.3 module met - a2dismond mpm_event ; systemctl restart apache2 ; puppet agent -tv (also see [[phab:T196968|T196968]], https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) [[phab:T247887|T247887]]
* 18:13 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 11:37 mutante: miscweb2002 - installed OS, added to puppet, added role and  ... sed -i 's/tin.eqiad/deployment.eqiad/g' /srv/deployment/iegreview/iegreview-cache/.config ([[phab:T247648|T247648]])
* 18:13 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS bullseye
* 11:30 marostegui: Deploy schema change on dbstore1004:3314
* 18:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 11:22 XioNoX: delete ARIN allocations from RIPE's IRR - [[phab:T235886|T235886]]
* 18:06 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 11:11 Urbanecm: EU SWAT done
* 18:03 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): no current blockers, rolling to group0.
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]; take II) (duration: 00m 58s)
* 17:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]) (duration: 00m 58s)
* 17:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:08 vgutierrez: pool cp2030 - [[phab:T248816|T248816]]
* 17:58 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]; take II) (duration: 00m 59s)
* 17:56 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]) (duration: 00m 59s)
* 17:56 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 10:43 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:55 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 10:34 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:53 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 10:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 10:33 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:59 hoo: Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata JSON dumps start at 9:59 UTC today ([[phab:T248612|T248612]])
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:56 hoo@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/Wikibase/repo/maintenance/DumpEntities.php: DumpEntities: Fix DB group default override ([[phab:T248612|T248612]]) (duration: 01m 02s)
* 17:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 09:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 09:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
* 08:30 vgutierrez: pool cp2029 - [[phab:T248816|T248816]]
* 16:47 sukhe: rolling restart of pdns-rec in A:wikidough to pick up config changes
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:47 sukhe: rolling restart of pdns-rec to pick up config changes
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 08:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pki2001.codfw.wmnet
* 07:53 vgutierrez: depool & decommission cp2002 - [[phab:T248818|T248818]]
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:48 marostegui: Run cloudcontrol1003:~# wmcs-wikireplica-dns to promote dbproxy1018 to wikireplicas active proxy [[phab:T231520|T231520]]
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
* 07:40 marostegui: Replace dbproxy1010 with dbproxy1011 for wiki replicas, analytics - [[phab:T231520|T231520]]
* 16:13 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
* 07:28 marostegui: Deploy schema change on labswiki (wikitech) - [[phab:T248333|T248333]]
* 16:11 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 07:26 marostegui: Deploy schema change on s4 codfw, this will generate lag on codfw - [[phab:T248333|T248333]]
* 16:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
* 07:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
* 07:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 16:00 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts pki2001.codfw.wmnet
* 07:10 vgutierrez: depool and decommission cp2001 - [[phab:T248815|T248815]]
* 15:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS bullseye
* 06:52 vgutierrez: pool cp2028 - [[phab:T247340|T247340]]
* 15:36 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
* 06:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P10813 and previous config saved to /var/cache/conftool/dbconfig/20200330-062858-marostegui.json
* 15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 06:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:32 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
* 06:04 marostegui: Deploy schema change on db1074 with replication, this will generate lag on s2 labs
* 15:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P10812 and previous config saved to /var/cache/conftool/dbconfig/20200330-060338-marostegui.json
* 15:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
* 05:40 vgutierrez: pool cp2027 - [[phab:T247340|T247340]]
* 15:19 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS bullseye
* 05:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:00 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 05:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 04:55 vgutierrez: Enable TLS Session tickets in ulsfo - [[phab:T245616|T245616]]
* 14:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 04:32 vgutierrez: upgrade ATS to version 8.0.6-1wm4 on ulsfo - [[phab:T245616|T245616]]
* 14:54 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 14:53 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 14:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 14:52 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 14:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:51 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:42 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1001.eqiad.wmnet with OS bullseye
* 14:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
* 14:16 claime: All active/active services in eqiad repooled, DNS issues resolved - [[phab:T331541|T331541]]
* 14:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2122 weight', diff saved to https://phabricator.wikimedia.org/P45866 and previous config saved to /var/cache/conftool/dbconfig/20230314-140926-root.json
* 14:01 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki1001.eqiad.wmnet with OS bullseye
* 14:00 jbond: reimage pki1001
* 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 13:33 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results (again, with sukhe's more-correct variant!)
* 13:27 TheresNoTime: close UTC afternoon backport window
* 13:26 samtar@deploy2002: Finished scap: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]] (duration: 07m 24s)
* 13:20 samtar@deploy2002: samtar and urbanecm: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:19 samtar@deploy2002: Started scap: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]]
* 13:18 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results
* 13:18 samtar@deploy2002: Finished scap: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]] (duration: 07m 55s)
* 13:11 samtar@deploy2002: esanders and samtar: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 13:10 samtar@deploy2002: Started scap: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]]
* 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:04 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 13:02 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 12:44 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 12:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45864 and previous config saved to /var/cache/conftool/dbconfig/20230314-123515-marostegui.json
* 12:23 moritzm: installing git security updates
* 12:20 samtar@deploy2002: Finished scap: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]] (duration: 09m 12s)
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45863 and previous config saved to /var/cache/conftool/dbconfig/20230314-122009-marostegui.json
* 12:20 TheresNoTime: `Command '['helmfile', '-e', 'eqiad', '--selector', 'name=canary', 'apply']' returned non-zero exit status 1.` (P45862) during scap deployment of [[phab:T297396|T297396]] + [[phab:T331680|T331680]] — scap rolled back
* 12:18 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host pki-root1001.eqiad.wmnet with OS bullseye
* 12:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 12:13 samtar@deploy2002: samtar and varnent: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 12:11 samtar@deploy2002: Started scap: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]]
* 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
* 12:08 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
* 12:08 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 12:06 claime: Unlocked scap deployments - [[phab:T331541|T331541]]
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45861 and previous config saved to /var/cache/conftool/dbconfig/20230314-120503-marostegui.json
* 12:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 12:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
* 11:51 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
* 11:51 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45860 and previous config saved to /var/cache/conftool/dbconfig/20230314-114957-marostegui.json
* 11:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 11:41 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 11:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 11:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 11:27 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 11:27 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45857 and previous config saved to /var/cache/conftool/dbconfig/20230314-112354-marostegui.json
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45856 and previous config saved to /var/cache/conftool/dbconfig/20230314-112333-marostegui.json
* 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
* 11:19 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
* 11:13 claime: We are encountering unexpected DNS anycast issued following [[phab:T331541|T331541]], latencies are increased but no production outage.
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45855 and previous config saved to /var/cache/conftool/dbconfig/20230314-110826-marostegui.json
* 11:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
* 11:03 akosiaris@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
* 11:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
* 10:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45854 and previous config saved to /var/cache/conftool/dbconfig/20230314-105319-marostegui.json
* 10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: [[phab:T331541|T331541]]
* 10:48 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: [[phab:T331541|T331541]]
* 10:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - [[phab:T331541|T331541]]
* 10:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki-root1001.eqiad.wmnet with OS bullseye
* 10:42 jbond: reimage pki-root1001
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45853 and previous config saved to /var/cache/conftool/dbconfig/20230314-103813-marostegui.json
* 10:33 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - [[phab:T331541|T331541]]
* 10:32 claime: Repooling all active/active services in eqiad - [[phab:T331541|T331541]]
* 10:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
* 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 10:28 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
* 10:28 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=99)
* 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
* 10:28 claime: Running sre.switchdc.mediawiki.00-optional-warmup-caches - [[phab:T331541|T331541]]
* 10:21 jbond: move pki.discovery.wmnet to pki2002 (buyllseye)
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45852 and previous config saved to /var/cache/conftool/dbconfig/20230314-101918-marostegui.json
* 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45851 and previous config saved to /var/cache/conftool/dbconfig/20230314-101840-marostegui.json
* 10:15 jayme: enabling puppet on P:calico::kubernetes for [[phab:T325268|T325268]]
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45850 and previous config saved to /var/cache/conftool/dbconfig/20230314-100334-marostegui.json
* 10:02 claime: Locking scap deployment for service switchover - [[phab:T331541|T331541]]
* 10:00 claime: Locking scap deployment for service switchover - [[phab:T330651|T330651]]
* 09:56 jayme: disabling puppet on P:calico::kubernetes for [[phab:T325268|T325268]]
* 09:54 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 09:53 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 09:51 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45849 and previous config saved to /var/cache/conftool/dbconfig/20230314-094828-marostegui.json
* 09:42 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:36 moritzm: installing NSS security updates
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45848 and previous config saved to /var/cache/conftool/dbconfig/20230314-093321-marostegui.json
* 09:32 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:23 Emperor: reboot ms-be2040 [[phab:T331860|T331860]]
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45847 and previous config saved to /var/cache/conftool/dbconfig/20230314-090649-marostegui.json
* 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 09:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45846 and previous config saved to /var/cache/conftool/dbconfig/20230314-084249-marostegui.json
* 08:38 vgutierrez: test HAProxy 2.6.10 in cp4044 and cp4045
* 08:31 vgutierrez: fetch haproxy 2.6.10 for thirdparty/haproxy26 (buster && bullseye) @ apt.wm.o
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45845 and previous config saved to /var/cache/conftool/dbconfig/20230314-082743-marostegui.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45843 and previous config saved to /var/cache/conftool/dbconfig/20230314-081236-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45842 and previous config saved to /var/cache/conftool/dbconfig/20230314-075730-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45841 and previous config saved to /var/cache/conftool/dbconfig/20230314-073210-marostegui.json
* 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45840 and previous config saved to /var/cache/conftool/dbconfig/20230314-073149-marostegui.json
* 07:26 marostegui: Migrate db1183 to mariadb m5 eqiad dbmaint 10.6 [[phab:T322294|T322294]]
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45839 and previous config saved to /var/cache/conftool/dbconfig/20230314-071643-marostegui.json
* 07:13 marostegui: Migrate db2135 to mariadb m5 codfw dbmaint 10.6
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45838 and previous config saved to /var/cache/conftool/dbconfig/20230314-070137-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45837 and previous config saved to /var/cache/conftool/dbconfig/20230314-064630-marostegui.json
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts centrallog1001
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 06:41 hashar: gerrit: changed `operations/puppet` merge strategy to allow "content merges" (see `ops` list for the rationale)
* 06:36 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 06:34 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 06:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts centrallog1001
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45836 and previous config saved to /var/cache/conftool/dbconfig/20230314-061633-marostegui.json
* 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 06:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:07 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 05:05 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@61ef435]: 0.3.122 (duration: 08m 45s)
* 04:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.122` on canary `wdqs1003`; proceedi