You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(mattflaschen@tin Synchronized wmf-config/InitialiseSettings-labs.php: Beta-only change (duration: 00m 12s) (logmsgbot))
imported>Stashbot
(hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 (duration: 00m 08s))
 
Line 1: Line 1:
== 2015-09-04 ==
== 2023-03-25 ==
* 23:52 logmsgbot: mattflaschen@tin Synchronized wmf-config/InitialiseSettings-labs.php: Beta-only change (duration: 00m 12s)
* 07:54 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 (duration: 00m 08s)
* 23:52 logmsgbot: mattflaschen@tin Synchronized wmf-config/CommonSettings-labs.php: Beta-only change (duration: 00m 11s)
* 07:54 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0
* 22:49 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/Citoid: https://gerrit.wikimedia.org/r/#/c/236218/ and https://gerrit.wikimedia.org/r/#/c/236222/ (duration: 00m 12s)
* 00:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
* 21:55 urandom: bouncing Cassandra on restbase1001 to restore default GC settings
* 00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host
* 18:36 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/ukwikivoyage.png: https://gerrit.wikimedia.org/r/#/c/236063/ (duration: 00m 11s)
* 00:57 mutante: doc1002 - issue is mismatched UIDs again, most likely. doc-uploader is debmonitor on new host
* 18:06 logmsgbot: krinkle@tin Synchronized php-1.26wmf21/extensions/WikimediaEvents/modules/ext.wikimediaEvents.statsd.js: Ib98988f67ef (duration: 00m 11s)
* 00:56 mutante: doc1002 - manually running rsync to doc2002 - which failed with status 23 when started by timer
* 17:35 MaxSem: Maps: dropped duplicate index on water_polygons
* 00:09 tzatziki: removing 2 files for legal compliance
* 16:27 jynus: cloning es1 mysql data from es1004 to es1018 [ETA:16h]
* 16:11 paravoid: updating firewall border ACLs and BGP border filters across all cr
* 15:42 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1002, es1016; Depool es1004 (duration: 00m 11s)
* 15:35 godog: python varnishlog collector + gdb running on cp1052 for debugging T83580
* 12:55 moritzm: restarted salt-master on palladium
* 12:47 moritzm: uploaded debdeploy 0.0.4 to carbon
* 10:18 logmsgbot: kartik@tin Synchronized php-1.26wmf21/extensions/ContentTranslation/api/ApiContentTranslationPublish.php: php-1.26wmf21/extensions/ContentTranslation/extension.json T111490:Use the VirtualRESTService to configure CX (duration: 00m 12s)
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-fr-ca_1.0.3~r61329-1
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-eo-fr_0.9.0~r28336-1
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-eo-es_0.9.1~r60655-1
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-eo-ca_0.9.1~r60655-1
* 09:16 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-ca-it_0.1.1~r57554-1
* 07:50 jynus: cloning es3 mysql data from es1008 to es1019
* 04:19 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Sep  4 04:19:20 UTC 2015 (duration 19m 19s)
* 02:26 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-04 02:26:04+00:00
* 02:23 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 21s)
* 01:56 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: T111439 (duration: 00m 12s)
* 00:11 logmsgbot: krinkle@tin Synchronized php-1.26wmf21/includes/resourceloader/ResourceLoader.php: I24f68e34a9fa4918 (duration: 00m 12s)
* 00:06 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235940/ (duration: 00m 11s)


== 2015-09-03 ==
== 2023-03-24 ==
* 23:53 logmsgbot: krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/235853/ (duration: 00m 12s)
* 23:58 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 23:51 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235843/ (duration: 00m 12s)
* 23:57 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 23:50 logmsgbot: krenair@tin Synchronized multiversion/MWMultiVersion.php: https://gerrit.wikimedia.org/r/#/c/235843/ (duration: 00m 12s)
* 23:50 tzatziki: removing 1 file for legal compliance
* 23:41 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235850/ (duration: 00m 12s)
* 21:08 mutante: mwmaint1002 ferm rules for rsyncd_access from miscweb removed by puppet after {{Gerrit|I4fe17f397856361}} which reverted a8af0339bde14018e8. manually deleted rsyncd config and stopped rsync service. complete noop on mwmaint2002 which is currently the active mwmaint server. [[phab:T328907|T328907]]
* 23:40 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/ukwikivoyage.png: https://gerrit.wikimedia.org/r/#/c/235850/ (duration: 00m 12s)
* 18:50 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable (duration: 00m 13s)
* 23:37 mutante: mw1224 - killed and restarted defunct hhvm, version is different from the one on mw1225
* 18:50 ebernhardson@deploy2002: Started deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable
* 23:37 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/235728 (duration: 00m 13s)
* 18:30 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags (duration: 00m 16s)
* 23:36 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/knwikisource.png: https://gerrit.wikimedia.org/r/#/c/235728/ (duration: 00m 12s)
* 18:30 ebernhardson@deploy2002: Started deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags
* 23:32 Krenair: mw1224 has been sending segfault warnings and "Lost parent, LightProcess exiting" to hhvm.log since about 21:17:34
* 18:00 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 20s)
* 23:29 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/CirrusSearch: https://gerrit.wikimedia.org/r/#/c/235905/ (duration: 00m 13s)
* 18:00 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag
* 23:28 logmsgbot: krenair@tin Synchronized php-1.26wmf21/package.json: bd2eb6cc1919c7dab056d5f8fe5b4a164236d78f (duration: 00m 13s)
* 17:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 06s)
* 23:02 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235908/ (duration: 00m 13s)
* 17:55 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag
* 21:21 ori: rebuilt HHVM with updated diff from facebook/hhvm PR #6071 (T109540), uploaded to apt as 3.6.5+dfsg1-1+wm5
* 15:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 21:18 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 15:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 19:54 bearND: MobileApps deployed sha1 553c399
* 15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 19:31 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf21
* 15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 18:13 ottomata: rolling restart of hadoop  yarn nodemanagers to pick up Yarn AppMaster port range limitation to apply ferm rules.
* 15:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 18:04 logmsgbot: catrope@tin Synchronized wmf-config/CommonSettings.php: Add plumbing code for Flow beta feature (unused for now) (duration: 00m 12s)
* 15:35 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 18:03 logmsgbot: catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add plumbing code for Flow beta feature (unused for now) (duration: 00m 12s)
* 15:09 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 17:39 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/OpenStackManager/nova/OpenStackNovaController.php: https://gerrit.wikimedia.org/r/#/c/235769/ (duration: 00m 12s)
* 14:59 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 17:34 mutante: bromine - deleting policy docroot
* 14:24 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki wikimaniawiki "2024:Expressions of Interest" "Wikimania:Expressions of Interest" "Zabe" --reason "per request [[:phab:T332917{{!}}T332917]]" # [[phab:T332917|T332917]]
* 17:06 jynus: cloning es1006 mysql data into es1015 [ETA:8h]
* 11:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet
* 16:30 bblack: updating nginx->1.9.4 on cp1071, cp3033 for prod validation before broader rollout
* 11:44 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
* 16:30 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: es3 master switchover from es1009 to es1014 (eqiad) (duration: 00m 13s)
* 11:01 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 16:28 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: es3 master switchover from es1009 to es1014 (codfw) (duration: 00m 13s)
* 11:01 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 16:26 mutante: imported jenkins 1.609.3 into APT repo
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
* 16:23 legoktm: fixed content model of Template:Languages@metawiki
* 10:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
* 16:21 robh: re-enabling puppet on all mw systems
* 10:35 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 16:14 robh: disabling puppet on all mw systems for apache config update
* 10:00 marostegui: Upgrade db1204 to mariadb 10.6 [[phab:T330861|T330861]]
* 16:01 jynus: performing es3 master switchover from es1009 to es1014
* 08:57 hashar: Fixed up Gerrit > GitHub replication which broke at 5:00 UTC by updating the Github RSA ssh host key [[phab:T332972|T332972]]
* 15:40 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: depool es1006 (duration: 00m 12s)
* 05:37 hashar: gerrit: refreshed ssh host key for `github.com`
* 15:17 hashar: stopping nodepool on labnodepool1001.eqiad.wmnet not ready yet
* 05:28 hashar: Restarted Gerrit
* 15:15 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: es2 master switchover from es1006 to es1011 (eqiad) (duration: 00m 13s)
* 05:26 hashar: Stopping Gerrit
* 15:14 logmsgbot: jynus@tin Synchronized wmf-config/db-codfw.php: es2 master switchover from es1006 to es1011 (codfw) (duration: 00m 12s)
* 05:26 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) (duration: 00m 10s)
* 15:05 logmsgbot: demon@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 13s)
* 05:26 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]])
* 15:04 logmsgbot: demon@tin Synchronized php-1.26wmf21/extensions/Translate/: (no message) (duration: 00m 15s)
* 05:22 hashar: Restarting gerrit replica on gerrit2002.wikimedia.org
* 14:51 jynus: performing es2 master switchover from es1006 to es1011
* 05:21 hashar@deploy2002: Finished deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]]) (duration: 00m 07s)
* 14:33 paravoid: rebooting msw1-eqiad
* 05:20 hashar@deploy2002: Started deploy [gerrit/gerrit@c1cbda4]: Update js plugins for EarlyWarning bot ([[phab:T330850|T330850]]) and displaying Zuul status on changes ([[phab:T241068|T241068]])
* 14:28 twentyafterfour: restarted phd (phabricator daemon) to pick up new configuration
* 05:17 hashar: Restarting Gerrit for deploying plugins updates
* 14:25 paravoid: changing IPv6 RA interval/lifetime/virtual-router-only @ eqiad
* 05:10 ejegg: Standalone SmashPig upgraded from {{Gerrit|3b84e4cb}} to {{Gerrit|50139e82}}
* 14:21 paravoid: rebooting msw1-codfw
* 05:04 ejegg: payments-wiki upgraded from {{Gerrit|4d0c90b4}} to {{Gerrit|4b0a71fa}}
* 13:17 paravoid: upgrading mr1-esams and mr1-eqiad to newer junos
* 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:13 godog: bounce carbon daemons on graphite1001
* 00:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:42 chasemp: unban elastic1001 and put back in service
* 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:24 chasemp: move all shards off of elastic1001
* 00:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:24 chasemp: disable elastic1001 in lvs as we are gonig to try fw apply round #2
* 11:02 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1028; increase the load of es1010, es1013 and es1017 (duration: 00m 12s)
* 10:45 jynus: applying schema change for ContentTranslation on x1-master "wikishared"
* 10:02 godog: reenable puppet on ms-be1*
* 09:16 jynus: started profiling mysql queries at phabricator. Only a 1% overhead is expected.
* 09:12 moritzm: updated rsyncd firewall rules (see https://gerrit.wikimedia.org/r/235425 for details)
* 09:12 godog: stop puppet on ms-be1* after ferm rsync change
* 08:23 godog: fixup current graphite retention T96662
* 07:26 moritzm: enabled ferm on dbstore* servers in codfw
* 06:29 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Thu Sep  3 06:29:35 UTC 2015 (duration 29m 34s)
* 03:09 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-03 03:09:20+00:00
* 03:06 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 32s)
* 02:45 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-09-03 02:45:36+00:00
* 02:39 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 10m 41s)
* 01:32 logmsgbot: krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 00:36 logmsgbot: ori@tin Synchronized php-1.26wmf21/includes/parser/Preprocessor_Hash.php: Idd1acd903: Decline to cache preprocessor items larger than 1 Mb (duration: 00m 11s)
* 00:36 logmsgbot: ori@tin Synchronized php-1.26wmf20/includes/parser/Preprocessor_Hash.php: Idd1acd903: Decline to cache preprocessor items larger than 1 Mb (duration: 00m 13s)
* 00:27 RoanKattouw: Deployed patch for T111029


== 2015-09-02 ==
== 2023-03-23 ==
* 23:58 logmsgbot: andyrussg@tin Synchronized php-1.26wmf20/extensions/CentralNotice/: CentralNotice update (duration: 00m 13s)
* 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:33 logmsgbot: andyrussg@tin Synchronized php-1.26wmf21/extensions/CentralNotice/: Update CentralNotice (duration: 00m 13s)
* 22:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 23:02 logmsgbot: andyrussg@tin Finished scap: Update CentralNotice to 2.6.0 for wmf21 (duration: 48m 18s)
* 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 22:13 logmsgbot: andyrussg@tin Started scap: Update CentralNotice to 2.6.0 for wmf21
* 22:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:27 arlolra: updated Parsoid to version 5f2fae6c
* 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:08 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf21
* 22:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 20:02 logmsgbot: krinkle@tin Synchronized php-1.26wmf21/resources/src/startup.js: Ie65427caee (duration: 00m 12s)
* 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:09 mutante: restarted gitblit, stopped counting
* 22:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 19:07 paravoid: upgrading mr1-codfw, mr1-ulsfo to newer junos
* 22:30 mutante: moscovium - rebooting to finalize distro release upgrade - [[phab:T332952|T332952]]
* 19:01 urandom: bouncing Cassandra on restbase1001 to address bogus icinga process failure alert
* 22:20 mutante: moscovium performing apt-get full-upgrade [[phab:T332952|T332952]]
* 18:52 legoktm: deployed patch for T110553
* 22:09 mutante: moscovium - when doing an in-place upgrade from buster to bullseye and you replace the string in sources.list, you also need to replace "bullseye-updates" with "bullseye-security" in the security.debian.org lines - that this is needed is called a bug at https://shagain.club/index.php/archives/641/ - [[phab:T327068|T327068]]
* 18:36 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf21
* 22:00 mutante: moscovium - apt-get full-upgrade ; apt autoremove ; replace buster with bullseye in sources.list ; repeat apt-get upgrade/full-upgrade etc. (https://wiki.debian.org/DebianUpgrade) [[phab:T327068|T327068]]
* 18:32 cmjohnson1: replacing disk 10 on db1028
* 22:00 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc2002.codfw.wmnet with OS bullseye
* 18:13 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 21:57 mutante: moscovium - apt-get upgrade (rt.wikimedia.org going into maintenance) [[phab:T327068|T327068]]
* 17:50 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/VisualEditor/modules/ve-mw/ui/inspectors: https://gerrit.wikimedia.org/r/#/c/235511/ (duration: 00m 12s)
* 21:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
* 17:07 logmsgbot: ori@tin Synchronized php-1.26wmf21/extensions/UniversalLanguageSelector: 78a5908fd9: Updated mediawiki/core Project: mediawiki/extensions/UniversalLanguageSelector (duration: 00m 16s)
* 21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on moscovium.eqiad.wmnet with reason: dist-upgrade
* 17:07 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/UniversalLanguageSelector: 2154acc529: Updated mediawiki/core Project: mediawiki/extensions/UniversalLanguageSelector (duration: 00m 13s)
* 21:48 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
* 16:25 mutante: restarting NTP on lvs2004
* 21:45 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc2002.codfw.wmnet with reason: host reimage
* 16:12 jynus: setting BBU auto-learn mode to warn only (disabled if not possible) on all database hosts
* 21:31 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 16:03 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/MultimediaViewer/MultimediaViewer.php: https://gerrit.wikimedia.org/r/#/c/235484/ (duration: 00m 12s)
* 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:01 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: https://gerrit.wikimedia.org/r/#/c/235486/ (duration: 00m 12s)
* 21:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:58 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/MultimediaViewer/MultimediaViewer.php: https://gerrit.wikimedia.org/r/#/c/235483/ (duration: 00m 13s)
* 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:56 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: https://gerrit.wikimedia.org/r/#/c/235485/ (duration: 00m 12s)
* 21:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:51 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: T110837 (duration: 00m 13s)
* 21:25 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 15:42 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/OpenStackManager/nova/OpenStackNovaController.php: https://gerrit.wikimedia.org/r/#/c/235482/ (duration: 00m 12s)
* 21:24 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - [[phab:T332819|T332819]]"
* 15:34 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/OpenStackManager/nova/OpenStackNovaController.php: https://gerrit.wikimedia.org/r/#/c/235479/ (duration: 00m 13s)
* 20:42 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 15:19 logmsgbot: krenair@tin Synchronized php-1.26wmf21/extensions/ContentTranslation/modules/tools/ext.cx.tools.template.js: https://gerrit.wikimedia.org/r/#/c/235442/ (duration: 00m 12s)
* 20:42 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 15:14 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/ContentTranslation/modules/tools/ext.cx.tools.template.js: https://gerrit.wikimedia.org/r/#/c/235441/ (duration: 00m 12s)
* 20:35 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 15:07 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234942/ and https://gerrit.wikimedia.org/r/#/c/234944/ (duration: 00m 13s)
* 20:34 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 14:40 Nikerabbit: TTMServer reindex complete
* 20:33 taavi@deploy2002: Finished scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] (duration: 10m 56s)
* 11:59 mark: removed tools LV snapshots on labstore1002
* 20:33 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
* 11:47 mark: kill STOP'ed rsync on labstore1002
* 20:24 taavi@deploy2002: abi and taavi: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 11:00 jynus: cloning mysql data from es1002 into es1016 [ETA:16h]
* 20:23 taavi@deploy2002: Started scap: Backport for [[gerrit:902370{{!}}MessageWebImporter: Use translation instead of language code on import (T323430)]]
* 10:30 moritzm: installed qemu security updates on labvirt*
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
* 09:41 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1002 (duration: 00m 12s)
* 19:36 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
* 09:21 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1010, pool es1017 (duration: 00m 13s)
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:19 hashar: Merged in "delete 1.26wmf12" https://gerrit.wikimedia.org/r/235347 which was left unmerged in Gerrit but was present on tin /srv/mediawiki-staging confusing people.
* 19:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 08:03 bblack: restarting ntp on lvs2004
* 19:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 08:01 moritzm: enable ferm on db1069/sanitarium
* 19:31 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 07:50 moritzm: enable ferm on remaining phabricator db hosts
* 19:31 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
* 04:54 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Sep  2 04:54:37 UTC 2015 (duration 54m 36s)
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2002
* 02:52 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-02 02:52:51+00:00
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:50 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 09s)
* 19:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 02:29 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-09-02 02:29:56+00:00
* 19:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 02:26 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 06m 31s)
* 19:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 00:33 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235366/ (duration: 00m 13s)
* 19:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc2002
* 18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 17:39 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 17:39 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 17:39 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 17:38 mutante: moscovium - systemctl stop rsync
* 17:38 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 17:38 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 17:37 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 17:18 mutante: aphlict1001 - systemctl reset-failed; systemctl start logrotate ; systemctl start logrotate.timer
* 16:59 sukhe: rolling out CR 901333 to A:cp-text [[phab:T313578|T313578]]
* 16:45 sukhe: disable Puppet in A:cp to test and then merge CR 901333
* 16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-main2002.codfw.wmnet with OS bullseye
* 16:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2002.codfw.wmnet with OS bullseye
* 16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
* 16:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2002.codfw.wmnet with reason: stop kafka and reimage
* 16:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 16:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 16:01 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:55 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:50 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:37 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc1002.wikimedia.org with OS bullseye
* 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
* 15:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1002.wikimedia.org with reason: host reimage
* 15:12 vgutierrez: testing haproxy_2.6.11-1~bpo11+wmf2_amd64.deb in text@ulsfo - [[phab:T332796|T332796]]
* 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc1002.wikimedia.org with OS bullseye
* 14:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
* 14:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host lists1003.wikimedia.org with OS bullseye
* 14:53 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:53 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:51 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:51 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
* 14:45 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
* 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1002.wikimedia.org
* 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1003.wikimedia.org with reason: host reimage
* 14:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host lists1003.wikimedia.org with OS bullseye
* 14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1002.wikimedia.org on all recursors
* 14:24 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc1002.wikimedia.org on all recursors
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
* 14:22 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 14:22 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host pybal-test2003.codfw.wmnet with OS bullseye
* 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
* 14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1002.wikimedia.org - jmm@cumin2002"
* 14:16 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
* 14:15 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
* 14:15 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
* 14:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc1002.wikimedia.org
* 14:13 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 14:13 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
* 14:11 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d] (duration: 01m 32s)
* 14:11 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
* 14:10 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) 2nd TEST [analytics/refinery@2520d3d]
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d] (duration: 00m 09s)
* 14:09 joal@deploy2002: Started deploy [analytics/refinery@2520d3d] (thin): Hotfix analytics deploy (virtualpageview oozie job) 2nd THIN [analytics/refinery@2520d3d]
* 14:09 joal@deploy2002: Finished deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d] (duration: 05m 10s)
* 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pybal-test2003.codfw.wmnet with reason: host reimage
* 14:03 joal@deploy2002: Started deploy [analytics/refinery@2520d3d]: Hotfix analytics deploy 2nd (virtualpageview oozie job) [analytics/refinery@2520d3d]
* 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
* 13:55 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host pybal-test2003.codfw.wmnet with OS bullseye
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:46 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:46 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac] (duration: 01m 28s)
* 13:46 TheresNoTime: close UTC afternoon backport window
* 13:45 samtar@deploy2002: Finished scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] (duration: 07m 46s)
* 13:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (hadoop-test): Hotfix analytics deploy (virtualpageview oozie job) TEST [analytics/refinery@f4113ac]
* 13:44 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac] (duration: 00m 08s)
* 13:44 joal@deploy2002: Started deploy [analytics/refinery@f4113ac] (thin): Hotfix analytics deploy (virtualpageview oozie job) THIN [analytics/refinery@f4113ac]
* 13:43 joal@deploy2002: Finished deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac] (duration: 13m 06s)
* 13:39 samtar@deploy2002: samtar: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:37 samtar@deploy2002: Started scap: Backport for [[gerrit:902207{{!}}core-Permissions: [dewiki] Add `ipblock-exempt` to `bot` group (T332759)]]
* 13:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] (duration: 08m 05s)
* 13:30 joal@deploy2002: Started deploy [analytics/refinery@f4113ac]: Hotfix analytics deploy (virtualpageview oozie job) [analytics/refinery@f4113ac]
* 13:29 samtar@deploy2002: samtar and sgimeno: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:902131{{!}}GrowthExperiments: disable add a link backend (T304551)]]
* 13:26 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki ckbwiki --fix` [[phab:T332470|T332470]]
* 13:25 samtar@deploy2002: Finished scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] (duration: 08m 39s)
* 13:18 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:16 samtar@deploy2002: Started scap: Backport for [[gerrit:902239{{!}}[trwikiquote] Removing the temporary logo (already reverted) (T329399)]], [[gerrit:902347{{!}}[ckbwiki] Add Draft and Draft_talk namespaces (T332470)]]
* 13:15 samtar@deploy2002: Finished scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] (duration: 11m 47s)
* 13:08 samtar@deploy2002: samtar and superpes: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 13:03 samtar@deploy2002: Started scap: Backport for [[gerrit:902211{{!}}[dkwikimedia] Fixing current logo with an HD version (T332784)]], [[gerrit:902216{{!}}[ptwikinews] Enable wgMinervaEnableSiteNotice (T332813)]]
* 12:14 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-druid1001.eqiad.wmnet with OS bullseye
* 12:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:58 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:57 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2004.codfw.wmnet with OS bullseye
* 11:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-druid1001.eqiad.wmnet with reason: host reimage
* 11:47 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache upload cluster - [[phab:T332796|T332796]]
* 11:36 btullis@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-druid1001.eqiad.wmnet with OS bullseye
* 11:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
* 11:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: host reimage
* 11:26 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host irc2002.wikimedia.org with OS bullseye
* 11:15 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:15 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 11:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2004.codfw.wmnet with OS bullseye
* 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
* 11:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2004.codfw.wmnet with reason: stop kafka and reimage
* 11:05 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 11:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 11:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
* 10:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2002.wikimedia.org with reason: host reimage
* 10:44 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host irc2002.wikimedia.org with OS bullseye
* 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2002.wikimedia.org
* 10:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2005.codfw.wmnet with OS bullseye
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2002.wikimedia.org on all recursors
* 10:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache irc2002.wikimedia.org on all recursors
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
* 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: host reimage
* 10:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2002.wikimedia.org - jmm@cumin2002"
* 10:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host irc2002.wikimedia.org
* 10:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main2005.codfw.wmnet with OS bullseye
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
* 09:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: stop kafka and reimage
* 09:47 moritzm: uploaded prometheus-druid-exporter 0.8-2 for bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]]
* 08:21 elukey: clean up docker and reboot kubernetes2024 to enable overlay2 - [[phab:T332803|T332803]]
* 08:11 vgutierrez: testing HAProxy 2.6.11 in cp4044 - [[phab:T332796|T332796]]
* 08:08 vgutierrez: fetch haproxy 2.6.11 in apt.wm.o thirdparty/haproxy26 for bullseye & buster
* 08:04 vgutierrez: rolling rollback to HAProxy 2.6.9 in cache text cluster - [[phab:T332796|T332796]]
* 07:54 elukey: clean up docker and reboot kubernetes2023 to enable overlay2 - [[phab:T332803|T332803]]
* 07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2023.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes2024.codfw.wmnet with reason: Restart docker with overlay
* 07:42 elukey: clean up docker on kubernetes1024 (cordon + stop kubelet + docker + clean /var/lib/docker/*) and reboot to enable overlay2 - [[phab:T332803|T332803]]
* 07:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
* 07:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kubernetes1024.eqiad.wmnet with reason: Restart docker with overlay
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45928 and previous config saved to /var/cache/conftool/dbconfig/20230323-072315-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45927 and previous config saved to /var/cache/conftool/dbconfig/20230323-070811-root.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45926 and previous config saved to /var/cache/conftool/dbconfig/20230323-065306-root.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45925 and previous config saved to /var/cache/conftool/dbconfig/20230323-063800-root.json
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45924 and previous config saved to /var/cache/conftool/dbconfig/20230323-062255-root.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45923 and previous config saved to /var/cache/conftool/dbconfig/20230323-060750-root.json
* 05:37 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 05:34 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 04:25 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 02:07 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host doc2002.codfw.wmnet with OS bullseye
* 02:00 mutante: rsyncing ~4GB files for static-codereview.wikimedia.org from old to newer VMs for [[phab:T331896|T331896]] - no automatic sync / deploy for these
* 01:05 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]"
* 01:03 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc1003 - denisse@cumin1001 - [[phab:T332812|T332812]]"
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 00:57 denisse@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host doc2002.codfw.wmnet with OS bullseye
* 00:57 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc2002.codfw.wmnet with OS bullseye
* 00:27 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc2002.codfw.wmnet
* 00:10 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doc1003.eqiad.wmnet with OS bullseye


== 2015-09-01 ==
== 2023-03-22 ==
* 23:59 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/221731/ (duration: 00m 13s)
* 23:59 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
* 23:41 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235285/ (duration: 00m 14s)
* 23:56 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doc1003.eqiad.wmnet with reason: host reimage
* 23:08 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235362/ (duration: 00m 14s)
* 23:46 denisse@cumin1001: START - Cookbook sre.ganeti.reimage for host doc1003.eqiad.wmnet with OS bullseye
* 23:02 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/235361/ (duration: 00m 13s)
* 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc2002.codfw.wmnet on all recursors
* 22:50 awight: update CRM from 0fc8474338e7a31fdde79287bd667b98cd96a252 to abc34b87ee9d1dbb1176f1929a3d748e1ee5ac7b
* 23:34 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc2002.codfw.wmnet on all recursors
* 22:18 MaxSem: Maps: creating and populating admin table
* 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:20 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/235177/ (duration: 00m 12s)
* 23:33 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 20:54 ori: restarted nutcracker on mw1142
* 23:32 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc2002.codfw.wmnet - denisse@cumin1001"
* 20:33 logmsgbot: twentyafterfour@tin Finished scap: sync 1.26wmf21 (duration: 30m 37s)
* 23:32 zabe: zabe@mwmaint2002:~$ mwscript namespaceDupes.php wikimaniawiki --fix # [[phab:T332782|T332782]]
* 20:03 logmsgbot: twentyafterfour@tin Started scap: sync 1.26wmf21
* 23:31 zabe@deploy2002: Finished scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] (duration: 10m 03s)
* 19:52 YuviPanda: removed tools20150901132642 from labstore vg on labstore1002
* 23:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1003.wikimedia.org
* 19:36 logmsgbot: ori@tin Synchronized php-1.26wmf20/includes/skins/SkinTemplate.php: cc643a0934: Deprecate unconditional loading of mediawiki.ui.button on all pages (duration: 00m 13s)
* 23:24 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 17:31 urandom: bouncing Cassandra on restbase1001 to apply temporary GC setting
* 23:24 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc2002.codfw.wmnet
* 17:28 dcausse: freezing elasticsearch indices before applying ferm fules on master
* 23:22 zabe@deploy2002: zabe: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 17:23 logmsgbot: aude@tin Synchronized php-1.26wmf20/extensions/Wikidata: Fix for change dispatcher (duration: 00m 20s)
* 23:21 zabe@deploy2002: Started scap: Backport for [[gerrit:902208{{!}}wikimaniawiki: Add namespace for 2024 wikimania (T332782)]]
* 16:45 jynus: performing schema change on testwiki and metawiki
* 21:15 taavi: UTC late backports complete
* 16:12 robh: policy.wikimedia.org dns change happening now
* 21:13 taavi@deploy2002: Finished scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] (duration: 07m 29s)
* 16:00 chasemp: ferm for elastic1003/2/1(master)
* 21:08 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc1003.eqiad.wmnet
* 15:57 logmsgbot: krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/235168/ (duration: 00m 13s)
* 21:08 taavi@deploy2002: taavi: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 15:51 YuviPanda: stopped replicate-tools on labstore1002, and cleaned out lockdir
* 21:06 taavi@deploy2002: Started scap: Backport for [[gerrit:902188{{!}}Remove OATHAuthMultipleDevicesMigrationStage from CS]], [[gerrit:902189{{!}}[beta] Write both for OATHAuthMultipleDevicesMigrationStage (T242031)]]
* 15:47 logmsgbot: reedy@tin Synchronized php-1.26wmf20/extensions/SecurePoll/: Stop cronspam (duration: 00m 13s)
* 21:05 taavi@deploy2002: Finished scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] (duration: 07m 17s)
* 15:47 mark: labstore1002: echo 10000 > /sys/block/md123/md/sync_speed_min
* 20:59 taavi@deploy2002: taavi: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 15:44 mark: labstore1002: update-initramfs -k all -u
* 20:58 taavi@deploy2002: Started scap: Backport for [[gerrit:902187{{!}}Set OATHAuthMultipleDevicesMigrationStage in IS]]
* 15:38 mark: labstore1002: mdadm /dev/md/slice51 --add /dev/sd{bh,bg,bf,be,bd,bc}
* 20:54 samtar@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:900748{{!}}Enable page tools for anonymous users (T331052)]] (duration: 10m 10s)
* 15:36 moritzm: disabled ferm in analytic1028, needs some more work on possibly dynamic mapreduce ports
* 20:37 akosiaris: uncordon reboot kubernetes1023. It was drained previously for ⚓ [[phab:T332803|T332803]]
* 15:16 mark: labstore1002: mdadm /dev/md/slice15 --re-add /dev/sd{bb,ba,az}
* 20:36 samtar@deploy2002: Finished scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] (duration: 11m 47s)
* 15:14 mark: labstore1002: mdadm /dev/md/slice15 --re-add /dev/sdaw
* 20:32 akosiaris: reboot kubernetes1023 for a test once more, ⚓ [[phab:T332803|T332803]]
* 15:07 mark: labstore1002: mdadm --zero-superblock /dev/sd{aw,bh,bg,bf,be,bd,bc,bb,ba,az}1
* 20:32 akosiaris: reboot kubernetes1023 for a test once more
* 15:04 moritzm: enabled ferm in analytic1028 (initial hadoop worker)
* 20:28 samtar@deploy2002: samtar and nray: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:04 mark: labstore1002: mdadm --zero-superblock /dev/sdax1 && mdadm /dev/md/slice15 --re-add /dev/sdax
* 20:25 akosiaris: reboot kubernetes1023 for a test
* 15:03 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/231465/ - VE for all new enwiki accounts (duration: 00m 13s)
* 20:24 samtar@deploy2002: Started scap: Backport for [[gerrit:902150{{!}}Enable pinning for anon main menu when page tools is enabled (T331657)]]
* 14:58 mark: labstore1002: mdadm /dev/md/slice15 --re-add /dev/sday
* 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] (duration: 09m 57s)
* 14:58 mark: labstore1002: mdadm --zero-superblock /dev/sday1
* 20:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists1003.wikimedia.org on all recursors
* 14:53 mark: labstore1002: mdadm --stop /dev/md3
* 20:15 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache lists1003.wikimedia.org on all recursors
* 14:37 ebernhardson: reset elasticsearch cluster.routing.allocation.disk.high back to 90%
* 20:15 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:38 logmsgbot: krinkle@tin Synchronized w/: Remove rl-test.php (duration: 00m 13s)
* 20:15 samtar@deploy2002: kharlan and samtar: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:17 moritzm: enabled ferm on db1048
* 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:901144{{!}}GrowthExperiments: Enable Leveling Up features on pilot wikis (T330358 T317813)]]
* 13:09 moritzm: enabled ferm on labsdb100[467]
* 20:12 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.eqiad.wmnet on all recursors
* 12:01 YuviPanda: disable puppet on labsdb1006
* 20:11 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.eqiad.wmnet on all recursors
* 08:58 moritzm: enabled ferm on labsdb1001
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:58 godog: fixup current graphite retention for metrics under "servers" hierarchy T96662
* 20:11 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
* 08:51 moritzm: enabled ferm on labsdb1002
* 20:10 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1003.eqiad.wmnet - denisse@cumin1001"
* 08:31 moritzm: enabled ferm on labsdb1003
* 20:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] (duration: 07m 22s)
* 08:29 godog: repool mw1125 mw1142 after nutcracker failures
* 20:07 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 07:45 jynus: cloning mysql data from es1010 to es1017 [ETA: 6h]
* 20:07 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.eqiad.wmnet
* 07:23 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1010 (duration: 00m 12s)
* 20:07 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 07:13 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1007, pool es1013 (duration: 00m 13s)
* 20:07 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1003.wikimedia.org
* 06:36 mutante: uploaded survey2012 to dumps/dataset1001; ownership as it is for survey2011; - T110746 in time for midnight PST
* 20:06 denisse@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doc1003.wikimedia.org
* 05:18 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Tue Sep  1 05:18:09 UTC 2015 (duration 18m 8s)
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
* 02:28 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-09-01 02:28:30+00:00
* 20:06 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
* 02:25 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 06m 00s)
* 20:06 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 20:05 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1003.wikimedia.org on all recursors
* 20:05 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache doc1003.wikimedia.org on all recursors
* 20:05 denisse@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:04 samtar@deploy2002: samtar and matmarex: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 20:02 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0 (duration: 00m 21s)
* 20:02 samtar@deploy2002: Started scap: Backport for [[gerrit:901723{{!}}Document running persistRevisionThreadItems.php for wgExtraSignatureNamespaces changes (T332745)]], [[gerrit:901724{{!}}Clean up DiscussionTools labs config]]
* 20:02 ebernhardson@deploy2002: Started deploy [airflow-dags/search@822dfed]: bump discolytics to 0.9.0
* 20:01 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 20:01 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host doc1003.wikimedia.org
* 18:16 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 18:12 mutante: rsyncing /srv/org/wikimedia/sitemaps files for https://sitemaps.wikimedia.org from old to new machines. most other things are auto-deployed by puppet or puppet running intial scap or automatic rsync.. this is not. rsync -av /srv/org/wikimedia/sitemaps/ rsync://miscweb2003.codfw.wmnet/miscapps-srv/org/wikimedia/sitemaps/ [[phab:T331896|T331896]] - but also see [[phab:T332101|T332101]]
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dborch1002.wikimedia.org
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
* 17:38 _joe_: stopping apache on mwdebug1001 to test the new envoy error page
* 17:15 hashar@deploy2002: Synchronized composer.json: build: add local typos check to composer.json # [[phab:T332121|T332121]] (duration: 06m 44s)
* 17:12 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dborch1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1001"
* 17:09 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 17:06 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 17:06 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 17:05 jhathaway@cumin1001: START - Cookbook sre.hosts.decommission for hosts dborch1002.wikimedia.org
* 17:05 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 17:04 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 16:45 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided) (duration: 00m 12s)
* 16:45 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@6cbc3bc]: (no justification provided)
* 16:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 16:37 eoghan@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
* 16:37 eoghan@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
* 16:35 vgutierrez: rolling downgrade to HAProxy 2.6.9 in text@esams - [[phab:T332796|T332796]]
* 16:24 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
* 16:19 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
* 16:18 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 16:18 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 15:58 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dborch1001.wikimedia.org with OS bullseye
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
* 15:53 moritzm: uploaded druid 0.19.wmf0-2 to bullseye-wikimedia [[phab:T332584|T332584]] [[phab:T332589|T332589]]
* 15:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
* 15:46 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2004.codfw.wmnet
* 15:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
* 15:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1001.wikimedia.org with reason: host reimage
* 15:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2004.codfw.wmnet
* 15:39 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:31 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:30 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:29 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1001.wikimedia.org with OS bullseye
* 15:27 elukey: `racadm racreset` for kafka-main2004 (no http idrac available for the cookbook, ssh one available)
* 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:26 eoghan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
* 15:25 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2004.codfw.wmnet
* 15:25 eoghan@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
* 15:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 15:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 15:22 hnowlan: removing java packages from maps hosts
* 15:17 eoghan@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 15:17 eoghan@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 15:13 hnowlan: removing cassandra packages from maps hosts
* 15:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:58 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 14:57 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:57 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 14:54 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:53 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
* 14:21 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45917 and previous config saved to /var/cache/conftool/dbconfig/20230322-141923-root.json
* 14:17 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-client1002.eqiad.wmnet with reason: host reimage
* 14:17 sukhe: enable Puppet on A:wikidough to roll out dnsdist.conf change
* 14:13 sukhe: disable Puppet on A:wikidough to roll out dnsdist.conf change
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45916 and previous config saved to /var/cache/conftool/dbconfig/20230322-140418-root.json
* 14:02 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45915 and previous config saved to /var/cache/conftool/dbconfig/20230322-134913-root.json
* 13:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45914 and previous config saved to /var/cache/conftool/dbconfig/20230322-133409-root.json
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45913 and previous config saved to /var/cache/conftool/dbconfig/20230322-131904-root.json
* 13:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG (duration: 00m 12s)
* 13:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a83464d]: Deplying latest country_project_page DAG
* 13:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:05 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 13:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45912 and previous config saved to /var/cache/conftool/dbconfig/20230322-130359-root.json
* 13:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 12:44 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 12:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 12:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 12:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:19 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 11:30 marostegui: Poweroff db1121 (lag will show on wikireplicas for s4 section) [[phab:T323961|T323961]]
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool needs to be rebooted [[phab:T323961|T323961]]', diff saved to https://phabricator.wikimedia.org/P45910 and previous config saved to /var/cache/conftool/dbconfig/20230322-112031-root.json
* 11:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 11:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 11:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 11:15 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2005.codfw.wmnet
* 11:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 11:09 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 11:02 jbond: upgrader prometheus-ipmi-exporter on buster and bullseye
* 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:59 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:41 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:36 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:34 elukey: `racadm racreset` for kafka-main2005 - http idrac not available (ssh on works fine)
* 10:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:29 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:26 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main2005.codfw.wmnet
* 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 10:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2005.codfw.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 10:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1004.eqiad.wmnet with OS bullseye
* 10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
* 09:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: host reimage
* 09:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1004.eqiad.wmnet with OS bullseye
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
* 09:27 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
* 09:23 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-main1004.eqiad.wmnet
* 09:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-main1004.eqiad.wmnet
* 09:11 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:10 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:02 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1004.eqiad.wmnet
* 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 09:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1004.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
* 08:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on pybal-test2003.codfw.wmnet with reason: Some tests with pybal/Bullseye
* 08:52 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 08:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:25 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:24 XioNoX: deploy measure-$site.wikimedia.org CNAMES
* 08:20 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 08:20 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 08:18 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 08:17 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 07:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 141082
* 07:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 141082
* 00:57 zabe@deploy2002: Finished scap: update interwiki cache (duration: 07m 02s)
* 00:50 zabe@deploy2002: Started scap: update interwiki cache
* 00:47 zabe@deploy2002: Finished scap: [[phab:T332115|T332115]] (duration: 06m 56s)
* 00:40 zabe@deploy2002: Started scap: [[phab:T332115|T332115]]
* 00:40 zabe: create Wikipedia Angika (anpwiki) # [[phab:T332115|T332115]]
* 00:38 zabe@deploy2002: Finished scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] (duration: 27m 00s)
* 00:29 zabe@deploy2002: zabe: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 00:11 zabe@deploy2002: Started scap: Backport for [[gerrit:901652{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901653{{!}}Add namespace translations for Angika (T332118)]], [[gerrit:901651{{!}}Add namespaces, linktrail and digit transform table for Angika (T332118)]]


== 2015-08-31 ==
== 2023-03-21 ==
* 23:56 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/233665/ (duration: 00m 11s)
* 23:46 zabe@deploy2002: Finished scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] (duration: 30m 08s)
* 23:49 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: reenable config changes for cirrus experimental completion api (duration: 00m 12s)
* 23:35 zabe@deploy2002: zabe: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 23:40 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/EducationProgram: 97ab82eab2: Updated mediawiki/core Project: mediawiki/extensions/EducationProgram  85a7d3932c1a4ad28f1a8dd05704f4e524152349 (duration: 00m 14s)
* 23:15 zabe@deploy2002: Started scap: Backport for [[gerrit:901650{{!}}Add messages for Angika Wikipedia (anpwiki) (T332115)]], [[gerrit:901649{{!}}Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)]]
* 23:27 logmsgbot: ebernhardson@tin Synchronized php-1.26wmf20/extensions/CirrusSearch/: (no message) (duration: 00m 12s)
* 23:07 zabe@deploy2002: Finished scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]] (duration: 07m 10s)
* 23:25 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: revert update for cirrussearch experimental suggestions api (duration: 00m 12s)
* 23:00 zabe@deploy2002: Started scap: [[gerrit:901722{{!}}Revert "dewiki: Allow 'crats to remove sysopship and manage importers"]]
* 23:21 logmsgbot: ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: update config of cirrussearch experimental suggestions api (duration: 00m 12s)
* 22:47 ejegg: payments-wiki upgraded from {{Gerrit|0fd66b1f}} to {{Gerrit|ab0a55a2}}
* 22:45 chasemp: disabled puppet on elastic hosts temporarily to safely roll out fw change.  elastic seems to have not taken it well and I'm holding for green cluster state.
* 22:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] (duration: 07m 15s)
* 21:20 mutante: installing package upgrades on argon
* 22:04 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:58 ori: imported pybal_1.08_amd64.changes to jessie-wikimedia
* 22:03 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901712{{!}}[Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)]]
* 20:44 chasemp: ferm for elastic100[4-7] and adjust ferm to include wikitech source
* 21:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 20:21 subbu: deployed parsoid version c3e4df5e
* 21:21 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 16:22 godog: depool mw1125 + mw1142 from api, nutcracker client connections exceeded
* 21:02 AndyRussG: update SmashPig  config {{Gerrit|6e651fd4}} -> {{Gerrit|035f602a}}
* 16:06 logmsgbot: thcipriani@tin Finished scap: SWAT: Ask the user to log in if the session is lost [[gerrit:234228]] (duration: 27m 07s)
* 20:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 15:59 jynus: restarting hhvm on mw2187
* 20:48 taavi: start [[phab:T315510|T315510]] migration script on group2 s7 wikis
* 15:39 logmsgbot: thcipriani@tin Started scap: SWAT: Ask the user to log in if the session is lost [[gerrit:234228]]
* 20:39 taavi@deploy2002: Finished scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] (duration: 09m 01s)
* 15:33 mutante: terbium - Could not find dependent Service[nscd] for File[/etc/ldap/ldap.conf]
* 20:31 taavi@deploy2002: matmarex and taavi: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:28 logmsgbot: thcipriani@tin Synchronized closed-labs.dblist: SWAT: Creating closed-labs.dblist and closing es.wikipedia.beta.wmflabs.org [[gerrit:234594]] (duration: 00m 13s)
* 20:30 taavi@deploy2002: Started scap: Backport for [[gerrit:901703{{!}}Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config]]
* 15:25 logmsgbot: thcipriani@tin Synchronized wmf-config/CirrusSearch-common.php: SWAT: Remove files from Commons from search results on wikimediafoundation.org [[gerrit:234040]] (duration: 00m 11s)
* 20:20 taavi@deploy2002: Finished scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] (duration: 17m 40s)
* 15:25 ottomata: starting varnishkafka instances on frontend caches to produce eventlogging client side events to kafka
* 20:10 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 15:21 logmsgbot: thcipriani@tin Synchronized php-1.26wmf20/extensions/Wikidata: SWAT: Update Wikidata - Fix formatting of client edit summaries [[gerrit:234991]] (duration: 00m 21s)
* 20:09 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 15:16 logmsgbot: thcipriani@tin Synchronized php-1.26wmf20/extensions/UploadWizard/resources/controller/uw.controller.Step.js: SWAT: Keep the uploads sorted in the order they were created in initially [[gerrit:234553]] (duration: 00m 12s)
* 20:04 taavi@deploy2002: esanders and taavi and matmarex: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:43 ebernhardson: elasticsearch cluster.routing.allocation.disk.watermark.high set to 75% to force elastic1022 to reduce its disk usage
* 20:02 taavi@deploy2002: Started scap: Backport for [[gerrit:900331{{!}}Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing]], [[gerrit:901697{{!}}Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)]]
* 14:41 urandom: bouncing Cassandra on restbase1001 to apply temporary GC setting
* 19:52 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
* 14:06 akosiaris: rebooted krypton. was reporting 100% cpu steal time
* 19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 13:40 paravoid: running puppet on newly-installed mc2001
* 19:43 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 13:40 paravoid: restarting hhvm on mw1065
* 19:41 jhathaway@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host dborch1002.wikimedia.org with OS bullseye
* 11:10 moritzm: restart salt-master on palladium
* 19:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 10:45 paravoid: reenabling asw2-a5-eqiad:xe-0/0/36 (T107635)
* 19:09 dancy@deploy2002: Installation of scap version "4.47.1" completed for 587 hosts
* 10:36 godog: repool ms-fe1004
* 19:07 dancy@deploy2002: Installing scap version "4.47.1" for 587 hosts
* 10:32 godog: repool ms-fe1003 and depool ms-fe1004 for firewall changes
* 19:04 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
* 10:19 godog: update graphite retention policy on files with previous retention and older than 30d T96662
* 19:03 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag (duration: 00m 14s)
* 10:18 godog: repool ms-fe1002 and depool ms-fe1003 for firewall changes
* 19:03 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag
* 10:05 godog: depool ms-fe1002 to apply firewall changes
* 19:01 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
* 09:55 jynus: cloning es1007 mysql data into es1013 (ETA: 5h30m)
* 18:52 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1002.wikimedia.org with OS bullseye
* 09:51 godog: repool ms-fe1001
* 18:38 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
* 09:35 godog: depool ms-fe1001 in preparation for ferm changes
* 18:36 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]
* 09:27 godog: update graphite retention policy on files with previous retention and older than 60d T96662
* 18:00 AndyRussG: update SmashPig config {{Gerrit|59a8b2d2}} -> {{Gerrit|6e651fd}}
* 09:25 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1007 for maintenance (duration: 00m 13s)
* 17:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dborch1002.wikimedia.org
* 08:33 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1028, return ES servers back from maintenance (duration: 00m 12s)
* 17:40 joal@deploy2002: Finished deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b] (duration: 00m 11s)
* 04:34 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Mon Aug 31 04:34:14 UTC 2015 (duration 34m 13s)
* 17:39 joal@deploy2002: Started deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b]
* 04:05 bblack: disabled ipv6 autoconf on neon, flushed old dynamic addr
* 17:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-client1002.eqiad.wmnet
* 02:32 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-31 02:32:25+00:00
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 02:29 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 06m 42s)
* 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:53 mutante: sudo cumin -b 4 -s 40 'C:role::cache::text' 'run-puppet-agent'
* 16:50 jbond: copy /usr/bin/prometheus-ipmi-exporter from bullseye to buster
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 16:46 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
* 16:45 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
* 16:43 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 16:43 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
* 16:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:28 jbond: upload prometheus-ipmi-exporter_1.6.1 to bullseye
* 16:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-test-client1002.eqiad.wmnet on all recursors
* 16:15 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-test-client1002.eqiad.wmnet on all recursors
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
* 16:13 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
* 16:10 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
* 16:10 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-client1002.eqiad.wmnet
* 15:57 jynus: running from cumin1001: transfer.py --type=decompress dbprov1003.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s5.2023-03-20--04-00-30.tar.gz db1145.eqiad.wmnet:/srv/sqldata.s5
* 15:53 jhathaway@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.wikimedia.org
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 15:53 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 15:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
* 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
* 15:52 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:51 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:42 jbond: stop puppet from deploying this further
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
* 15:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
* 15:26 samtar@deploy2002: Finished scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] (duration: 09m 11s)
* 15:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:19 samtar@deploy2002: samtar: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:17 samtar@deploy2002: Started scap: Backport for [[gerrit:900828{{!}}InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)]]
* 15:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:10 samtar@deploy2002: Finished scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] (duration: 09m 32s)
* 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 15:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 15:02 samtar@deploy2002: samtar: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:02 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 15:00 samtar@deploy2002: Started scap: Backport for [[gerrit:901289{{!}}wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)]]
* 14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 14:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 14:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=kartotherian,name=maps1005.eqiad.wmnet
* 14:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=maps1005.eqiad.wmnet
* 14:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 14:38 hnowlan: disabling puppet on maps* before merging 760619
* 14:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:27 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:17 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:14 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:10 urbanecm@deploy2002: Finished scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]] (duration: 07m 53s)
* 14:10 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 14:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 14:02 urbanecm@deploy2002: Started scap: Backport for [[gerrit:901588{{!}}Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)]]
* 14:00 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:58 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
* 13:40 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:33 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:28 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:25 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:21 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:16 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
* 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
* 13:05 elukey: move kafka mirror maker instances to PKI migration settings (new truststores) - [[phab:T319372|T319372]]
* 11:20 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 11:09 joal: Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00
* 11:08 joal: Kill mediacounts_load oozie job
* 11:07 joal: Unpause mediawiki_history_denormalize airflow job
* 11:06 joal: Kill mediawiki_denormalize oozie job
* 11:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s)
* 11:04 joal@deploy2002: Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b]
* 10:43 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:32 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:24 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s)
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9]
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s)
* 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9]
* 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s)
* 10:14 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9]
* 09:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
* 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
* 09:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
* 09:25 phedenskog@deploy2002: Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s)
* 09:25 phedenskog@deploy2002: Started deploy [performance/navtiming@d2b97ad]: (no justification provided)
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 08:31 elukey: move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - [[phab:T319372|T319372]]
* 06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150
* 06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150
* 03:57 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.26 (duration: 02m 18s)
* 03:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]] (duration: 52m 38s)
* 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.1  refs [[phab:T330207|T330207]]


== 2015-08-30 ==
== 2023-03-20 ==
* 12:58 godog: lvchange -ay labstore/others on labstore1002
* 22:00 samtar@deploy2002: Finished scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] (duration: 09m 45s)
* 12:52 godog: start-nfs on labstore1002
* 21:52 samtar@deploy2002: jdlrobson and samtar: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 12:31 godog: lvchange -ay labstore/tools on labstore1002
* 21:50 samtar@deploy2002: Started scap: Backport for [[gerrit:901275{{!}}Add languages to Minerva HTML (T331905)]]
* 12:30 godog: also disabled puppet on labstore1002 while investigating
* 21:34 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki shwiki --fix` [[phab:T332614|T332614]]
* 12:15 godog: trying to manually assemble missing raid on labstore1002 with mdadm --assemble /dev/md/slice51 --uuid 0747643d:b89b36ff:57156095:c33694fc --verbose
* 21:25 TheresNoTime: closing UTC late backport window, extended
* 11:19 YuviPanda: powered labstore1002 back up
* 21:22 samtar@deploy2002: Finished scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] (duration: 12m 22s)
* 11:17 YuviPanda: shut down labstore1002, going to powercycle from mgmt
* 21:11 samtar@deploy2002: samtar and aleksandar: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 10:34 YuviPanda: disabled backups on labstore1002 to prevent overwriting of good backups on 2001
* 21:10 samtar@deploy2002: Started scap: Backport for [[gerrit:901276{{!}}Rename project and project talk namespace for shwiki (T332614)]]
* 10:08 YuviPanda: rebooted labstore1002
* 21:09 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer (duration: 00m 13s)
* 04:16 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sun Aug 30 04:16:17 UTC 2015 (duration 16m 16s)
* 21:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer
* 02:23 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-30 02:23:07+00:00
* 21:09 samtar@deploy2002: Finished scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] (duration: 08m 34s)
* 02:20 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 05m 36s)
* 21:02 samtar@deploy2002: matmarex and samtar: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 21:00 TheresNoTime: extending UTC late backport window
* 21:00 samtar@deploy2002: Started scap: Backport for [[gerrit:898845{{!}}Enable new Vector (2022) "Add topic" button at arwiki (T331313)]], [[gerrit:898846{{!}}Enable DiscussionTools usability improvements at arwiki (T329407)]]
* 20:58 kharlan@deploy2002: Finished scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] (duration: 10m 28s)
* 20:49 kharlan@deploy2002: kharlan: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmn
* 20:47 kharlan@deploy2002: Started scap: Backport for [[gerrit:901146{{!}}TryNewTask: Set an array fallback if TryNewTaskOptOuts is null]], [[gerrit:900685{{!}}PostEdit: Increment the edit-count-for-task-type count (T332319)]], [[gerrit:900684{{!}}LevelingUpManager: Handle links/link-recommendation collision (T332309)]]
* 19:49 mutante: miscweb1003 - manually edit /srv/deployment/iegreview/iegreview-cache/.config and replace tin.eqiad.wmnet with deployment.eqiad.wmnet (which is an alias for deploy2002.codfw.wmnet) [[phab:T257317|T257317]] [[phab:T332623|T332623]] [[phab:T331896|T331896]]
* 19:13 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator (duration: 00m 13s)
* 19:13 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator
* 18:56 ejegg: switched back to new PayPal pending transaction resolver
* 18:48 akosiaris@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 28s)
* 18:47 akosiaris: emergency rollover of redis password complete
* 18:45 akosiaris: re-enable puppet on rdb*, netbox*, ores*, registry*
* 18:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script (duration: 00m 13s)
* 18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script
* 18:42 ejegg: civicrm upgraded from {{Gerrit|3d3606f1}} to {{Gerrit|09373b9d}}
* 18:32 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:32 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 18:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:32 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 18:31 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 18:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
* 18:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
* 18:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
* 18:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
* 18:16 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 18:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 18:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 18:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 18:15 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 18:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 18:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 18:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 18:05 mutante: miscweb1003 - syntax error in httpd config due to "Unknown Authn provider: ldap" - comes from static-rt vhost ([[phab:T331896|T331896]])
* 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet
* 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet
* 17:59 mutante: when applying apache role for the first time on new hosts we still have the same old conflict:  miscweb1003 - manual "a2dismod mpm_event" to be able to let puppet enable mod PHP ([[phab:T196968|T196968]])
* 17:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
* 17:57 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
* 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 17:26 akosiaris: disable puppet on rdb*, netbox*, ores*, registry*
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
* 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
* 16:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 16:36 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 16:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 16:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 16:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 16:21 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 15:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:53 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 14:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2552
* 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2552
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 and promote es2027 to es3 master', diff saved to https://phabricator.wikimedia.org/P45896 and previous config saved to /var/cache/conftool/dbconfig/20230320-143951-root.json
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]]
* 14:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T326564|T326564]]
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:17 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 14:11 TheresNoTime: close UTC afternoon backport window
* 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
* 14:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
* 14:08 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autopatrol' 'autopatrolled'` [[phab:T331762|T331762]]
* 14:06 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 14:05 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreview' 'autopatrol'` [[phab:T331762|T331762]]
* 14:03 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki slwiki --fix` [[phab:T332351|T332351]]
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'reviewer' 'patrol'` [[phab:T331762|T331762]]
* 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreviewer' 'autopatrol'` ("nothing to do") [[phab:T331762|T331762]]
* 14:00 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki ptwikisource editor` [[phab:T331762|T331762]]
* 13:58 samtar@deploy2002: Finished scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] (duration: 09m 44s)
* 13:50 samtar@deploy2002: thiemowmde and samtar and zoranzoki21: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:49 samtar@deploy2002: Started scap: Backport for [[gerrit:776200{{!}}Remove meaningless restriction level "none"]], [[gerrit:900696{{!}}Remove FlaggedRevs from ptwikisource (T331762)]]
* 13:47 samtar@deploy2002: Finished scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] (duration: 09m 26s)
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host cuminunpriv1001.eqiad.wmnet with OS bullseye
* 13:39 samtar@deploy2002: aleksandar and samtar: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:38 samtar@deploy2002: Started scap: Backport for [[gerrit:900675{{!}}SITENAME change of Serbo-Croatian Wikipedia (T332468)]]
* 13:37 samtar@deploy2002: Finished scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] (duration: 08m 46s)
* 13:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
* 13:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
* 13:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
* 13:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
* 13:30 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 30s)
* 13:29 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
* 13:29 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to {{Gerrit|a6e9843}}
* 13:28 samtar@deploy2002: Started scap: Backport for [[gerrit:900689{{!}}kuwiktionary: Add wordmark (T326067)]], [[gerrit:900742{{!}}trwikivoyage: Update wordmark (T332439)]]
* 13:28 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:26 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}} (duration: 01m 39s)
* 13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
* 13:24 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to {{Gerrit|a6e9843}}
* 13:18 samtar@deploy2002: Finished scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] (duration: 11m 36s)
* 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host cuminunpriv1001.eqiad.wmnet with OS bullseye
* 13:17 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
* 13:17 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
* 13:14 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 13:14 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 13:08 samtar@deploy2002: stang and samtar: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:06 samtar@deploy2002: Started scap: Backport for [[gerrit:900537{{!}}bewiki: Remove group "autoeditor", "reviewer" (T326012)]], [[gerrit:900690{{!}}slwiki: Create Draft namespace (T332351)]]
* 11:35 krinkle@deploy2002: Synchronized php-1.40.0-wmf.27/includes/libs/rdbms/: (no justification provided) (duration: 15m 28s)
* 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
* 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
* 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082
* 09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141082
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58655
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58655
* 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2552
* 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2552
* 09:21 claime: Repooling parse2004 - [[phab:T332119|T332119]]
* 08:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 138915
* 08:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 138915
* 08:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138915
* 08:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138915


== 2015-08-29 ==
== 2023-03-19 ==
* 15:26 jynus: killing idle mysql connections from phabricator and setting wait and interactive timeout to 60
* 18:27 AndyRussG: update config (to re-enable old PayPal orphan slayer job) {{Gerrit|27a5b481}} -> {{Gerrit|6359222d}}
* 09:30 jynus: SCAP failed, cannot depool db1028
* 16:44 apergos: dumpsdata1005 conversion to primary dumps nfs server done
* 09:28 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1028, return ES servers back from maintenance (duration: 00m 03s)
* 15:12 AndyRussG: update config (to disable paypal_ec pending transaction resolver) {{Gerrit|5dd37c9c}} -> {{Gerrit|3d3606f1}}
* 09:28 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1028, return ES servers back from maintenance (duration: 00m 03s)
* 14:18 apergos: work starting now to swap dumpsdata1005 in for primary nfs server, replacing dumpsdata1003 which will become dumps spare host
* 09:05 jynus: about to depool db1028 due to disk issue
* 00:17 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
* 04:17 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Sat Aug 29 04:17:55 UTC 2015 (duration 17m 54s)
* 00:17 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 02:24 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-29 02:24:01+00:00
* 02:21 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 05m 48s)


== 2015-08-28 ==
== 2023-03-18 ==
* 23:45 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234679/ (duration: 06m 56s)
* 22:47 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
* 22:51 logmsgbot: bd808@tin Synchronized wmf-config/CommonSettings-labs.php: Use ffmpeg instead of avconv on labs beta (I250fe33) (duration: 06m 05s)
* 22:47 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 22:05 ori: disabling puppet on tin for a few minutes to test an ssh-agent-proxy change
* 14:26 apergos: rsync of xmldata public dir  from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
* 20:04 logmsgbot: catrope@tin Synchronized php-1.26wmf20/resources/src/mediawiki.legacy/shared.css: T110716 (duration: 00m 12s)
* 13:46 apergos: rsync of xmldata private dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
* 18:09 robh: updating ldap-codfw cert
* 07:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 17:10 logmsgbot: catrope@tin Synchronized php-1.26wmf20/extensions/Flow/includes/Parsoid/Utils.php: T110676 (duration: 00m 13s)
* 07:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
* 17:08 urandom: bouncing Cassandra on restbase1001 to apply default (puppet-managed) settings
* 02:57 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
* 16:03 chasemp: ferm for elasticsearch10(0[8-9|1[0-13])
* 02:57 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 15:31 awight: updated crm from fc0fcc8f5af262b56392d3f4f5998f8ea08c99a8 to 0fc8474338e7a31fdde79287bd667b98cd96a252
* 01:21 urandom: powercycling restbase2025 — [[phab:T332462|T332462]]
* 15:23 chasemp: ferm for elasticsearch10[14-17]
* 00:06 AndyRussG: Updating civicrm from {{Gerrit|5dd37c9c}} to {{Gerrit|3d3606f1}}
* 11:09 logmsgbot: aude@tin Synchronized php-1.26wmf20/extensions/Wikidata/Wikidata.php: Sync entry point - updated to work on Jenkins together with ContentTranslation (duration: 00m 12s)
* 10:29 godog: reenable puppet on ms-fe1, ferm changes will go out on monday
* 09:48 jynus: Cloning es1001 database into es1012
* 09:45 moritzm: enabled ferm for swift on esams
* 09:28 moritzm: enabled ferm on strontium puppetmaster backend
* 09:00 moritzm: enabled ferm on rhodium puppetmaster backend
* 08:29 moritzm: uploaded debdeploy 0.0.3 to carbon
* 08:23 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1001, increas weight of es1011, pool es1014 for the first time (duration: 00m 13s)
* 05:59 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Fri Aug 28 05:59:09 UTC 2015 (duration 59m 8s)
* 04:58 logmsgbot: ori@tin Synchronized php-1.26wmf20/includes/parser/Parser.php: 754b222daf: Add ParserOutput cache and expiry times to NewPP report (duration: 00m 13s)
* 02:41 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-28 02:41:26+00:00
* 02:35 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 10m 47s)
* 01:59 Tim: on ruthenium: started parsoid_vd which was previously killed by oom-killer
* 01:58 Tim: on ruthenium, reduced parsoid-rt-client concurrency from 16 to 8 since it was OOM and oom-killer was killing random things
* 01:37 Tim: on ruthenium restarted parsoid-rt-client and parsoid-vd-client
* 00:24 mutante: powercycled mw2027
* 00:19 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/234450/ (duration: 01m 14s)
* 00:06 logmsgbot: krenair@tin Synchronized wmf-config/mobile.php: live hack to make previous commit work (duration: 01m 14s)
* 00:05 Krenair: Another codfw host broke: mw2027
* 00:01 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234330/ (duration: 00m 13s)


== 2015-08-27 ==
== 2023-03-17 ==
* 23:58 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/MobileFrontend/includes/MobileFormatter.php: https://gerrit.wikimedia.org/r/#/c/234331/1 (duration: 00m 12s)
* 19:53 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching (duration: 00m 13s)
* 23:57 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/MobileFrontend/includes/config/Experimental.php: https://gerrit.wikimedia.org/r/#/c/234331/1 (duration: 00m 14s)
* 19:53 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching
* 23:55 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/233439/ (duration: 00m 12s)
* 19:52 bd808: Testing Mastodon account changes. This should post to @wikimedia_sal@botsin.space
* 23:30 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/Gadgets/extension.json: touch (duration: 00m 13s)
* 19:06 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch (duration: 00m 13s)
* 23:24 logmsgbot: krenair@tin Synchronized php-1.26wmf20/includes/DefaultSettings.php: https://gerrit.wikimedia.org/r/#/c/234328/ (duration: 00m 12s)
* 19:06 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch
* 23:24 logmsgbot: krenair@tin Synchronized php-1.26wmf20/includes/registration/ExtensionProcessor.php: https://gerrit.wikimedia.org/r/#/c/234328/ (duration: 00m 12s)
* 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
* 23:23 logmsgbot: krenair@tin Synchronized php-1.26wmf20/includes/MWNamespace.php: https://gerrit.wikimedia.org/r/#/c/234328/ (duration: 00m 13s)
* 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
* 23:15 logmsgbot: krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/234009/ (duration: 00m 13s)
* 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
* 23:04 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233100/ (duration: 00m 12s)
* 18:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
* 20:11 chasemp: ferm setup on elasticsearch10(1[8-9|2[0-3])
* 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
* 20:06 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf20
* 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
* 19:57 logmsgbot: twentyafterfour@tin Synchronized php-1.26wmf20/includes/media/XMP.php: deploy fix for T89532 on 1.26wmf20 (duration: 00m 13s)
* 18:10 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
* 18:16 chasemp: setting up ferm on elastic1027-31
* 18:09 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 17:47 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/234320/ (duration: 00m 13s)
* 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
* 17:43 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234320/2 (duration: 00m 13s)
* 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
* 17:37 urandom: ack'd Cassandra process alert on restbase1001; temporary command args have pushed the class name beyond the limit
* 17:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
* 17:34 logmsgbot: krenair@tin Synchronized multiversion/MWMultiVersion.php: (no message) (duration: 00m 12s)
* 17:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
* 17:24 logmsgbot: krenair@tin Synchronized multiversion/MWMultiVersion.php: https://gerrit.wikimedia.org/r/#/c/234320/ (duration: 00m 12s)
* 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5004.eqsin.wmnet
* 17:08 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs5004.eqsin.wmnet
* 16:51 moritzm: ferm rules on logstash100[1-3] have been amended to allow grafana from reading dashboard configs
* 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:39 bd808: new ferm rules on logstash100[1-3] are blocking grafana from reading dashboard configs.
* 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:22 moritzm: ferm enabled on logstash1003
* 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
* 16:18 moritzm: ferm enabled on logstash1002
* 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
* 16:16 bd808: ferm enabled on logstash1001
* 15:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 16:06 bd808: logstash1001 back up after system reboot; we applied a default drop rule without applying the other iptables changes; will try again
* 15:29 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 15:58 chasemp: rebooting logstash1001.mgmt.eqiad.wmnet for moritz as it is having issues
* 15:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 15:47 bblack: killed hung ubuntu mirror rsync commands on carbon, from Jul 10
* 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 15:45 bd808: logstash1001 not responding over ssh following ferm rules application; moritzm investigating
* 14:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 15:30 bd808: Disabled puppet on logstash100[1-3] prior to trying to enable ferm
* 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 15:11 logmsgbot: thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable newarticle campaign in itwiki [[gerrit:234223]] (duration: 01m 52s)
* 14:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 14:52 bblack: re-imaging lvs200[123]
* 14:54 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 14:47 godog: reenable puppet on ms-be1*
* 14:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:22 godog: disable puppet on ms-fe1 / ms-be1 in prepration for puppet work
* 14:13 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 14:15 godog: reenable puppet on ms-fe2*
* 14:05 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 13:47 bblack: re-imaging lvs2004 + lvs2005
* 13:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
* 13:29 ottomata: doing rolling restart of kafka brokers to apply auto_create_topics change
* 13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 13:21 godog: enable puppet on ms-be2*
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:21 ottomata: stopping kafka on analytics1021, it is no longer a kafka broker.
* 13:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 13:09 godog: disable puppet on ms-be2* in preparation for firewall changes
* 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 13:09 jynus: cloning es1008 into es1014
* 13:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 13:04 ottomata: running leader election now that all topics and partitions are rebalanced across new kafka nodes
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 12:46 bblack: re-imaging lvs2006
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 12:45 andrewbogott: re-imaging labnet1001 (I hope)
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 11:33 _joe_: restarted hhvm on mw1143, locked in __lll_lock_wait for stat_cache deadlock
* 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
* 11:10 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Pool es1011 for the first time, depool es1008 (duration: 00m 12s)
* 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
* 09:27 jynus: installing and configuring servers es1012-es1019
* 13:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2004.codfw.wmnet
* 06:39 ostriches: tin: dropped useless "gerrit" remote from /srv/mediawiki-staging (uses ssh, lol), pointed {origin,readonly} at the actual repo instead of a redirect.
* 13:21 claime: Depooling parse2004.codfw.wmnet for broken PSU - [[phab:T332119|T332119]]
* 06:00 _joe_: powercycling mw2140, not responding to ping, blank console
* 12:06 mutante: systemct-reset failed on gitlab-runner*
* 03:17 awight: deploy config cleanup for paymentswiki
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 02:38 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 10m 44s)
* 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 02:16 awight: push config change to the payments orphan slayer: explitly give stomp port to work around strict notice, clean up unused globals. T109911
* 11:03 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 01:32 ejegg: updated payments from 8ba4b5299f195cf48e6809b18a21e2d53f6eec1b to 6ac552f280fb839069d117386c4ecbe9e52f90a8
* 11:02 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 00:31 twentyafterfour: finished phabricator upgrade, everything appears to be working
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 00:24 logmsgbot: aaron@tin Synchronized php-1.26wmf19/extensions/CentralAuth: 47e181adb2898977b146de7398eaa35aebb870e3 (duration: 01m 13s)
* 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 00:22 logmsgbot: aaron@tin Synchronized php-1.26wmf20/extensions/CentralAuth: 47e181adb2898977b146de7398eaa35aebb870e3 (duration: 01m 13s)
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 00:20 twentyafterfour: taking phabricator offline for scheduled upgrade
* 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl', diff saved to https://phabricator.wikimedia.org/P45887 and previous config saved to /var/cache/conftool/dbconfig/20230317-055643-marostegui.json
* 02:10 ejegg: civicrm upgraded from {{Gerrit|672950d9}} to {{Gerrit|5dd37c9c}}
* 01:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2010.codfw.wmnet
* 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2010.codfw.wmnet
* 00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
* 00:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
* 00:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
* 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
* 00:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates
* 00:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates


== 2015-08-26 ==
== 2023-03-16 ==
* 23:59 Krinkle: mwscript deleteEqualMessages.php --wiki rowiki
* 23:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
* 23:57 yurik: git deployed tilerator - had the 4/5 issue - https://phabricator.wikimedia.org/T110434
* 23:40 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
* 23:46 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234072/ (duration: 01m 12s)
* 23:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
* 23:37 logmsgbot: krenair@tin Synchronized php-1.26wmf20/maintenance/deleteEqualMessages.php: https://gerrit.wikimedia.org/r/#/c/234038/ (duration: 01m 12s)
* 23:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
* 23:35 logmsgbot: krenair@tin Synchronized php-1.26wmf19/maintenance/deleteEqualMessages.php: https://gerrit.wikimedia.org/r/#/c/234037/1 (duration: 01m 12s)
* 23:31 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb2003.codfw.wmnet with OS bullseye
* 23:27 yurik: deployed kartotherian
* 23:28 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb1003.eqiad.wmnet with OS bullseye
* 23:21 jynus: cloning es1005 into es1011, ETA 9 hours
* 23:20 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0 (duration: 00m 19s)
* 22:41 ori: armed keyholder on tin
* 23:20 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0
* 22:40 ori: Disabled Puppet on mw1017 for 2hrs and applied I059b0c96c9 for testing.
* 23:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
* 21:55 logmsgbot: krinkle@tin Synchronized php-1.26wmf19/includes/poolcounter/PoolWorkArticleView.php: (no message) (duration: 01m 12s)
* 23:15 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
* 21:48 logmsgbot: jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1005 (duration: 01m 12s)
* 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
* 21:40 logmsgbot: krinkle@tin Synchronized php-1.26wmf20/includes/poolcounter/PoolWorkArticleView.php: (no message) (duration: 01m 12s)
* 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
* 21:32 ori: Disabling Puppet on tin again to test an ssh-agent-proxy change
* 23:01 dzahn@cumin1001: START - Cookbook sre.ganeti.reimage for host miscweb1003.eqiad.wmnet with OS bullseye
* 20:30 logmsgbot: ori@tin Synchronized README: testing ssh-agent-proxy changes (duration: 00m 13s)
* 23:00 dzahn@cumin2002: START - Cookbook sre.ganeti.reimage for host miscweb2003.codfw.wmnet with OS bullseye
* 20:25 ori: Disabling puppet on tin and hacking some debug logging into ssh-agent-proxy
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb1003.eqiad.wmnet
* 20:24 ori: armed ssh-agent key on mira
* 22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb2003.codfw.wmnet
* 20:21 logmsgbot: krinkle@tin Synchronized php-1.26wmf20/includes/poolcounter/PoolWorkArticleView.php: (no message) (duration: 00m 03s)
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb1003.eqiad.wmnet on all recursors
* 20:11 subbu: deployed parsoid version 44d657de
* 22:39 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache miscweb1003.eqiad.wmnet on all recursors
* 19:52 logmsgbot: krenair@tin Synchronized php-1.26wmf20/extensions/Echo/includes/mapper/EventMapper.php: https://gerrit.wikimedia.org/r/#/c/234082/ (duration: 00m 12s)
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:47 mutante: sodium - deleting shunted messages older than 7 days
* 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
* 19:23 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234042/ (duration: 00m 12s)
* 22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
* 19:22 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/234024/ (duration: 00m 12s)
* 22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 19:20 logmsgbot: krenair@tin Synchronized multiversion/MWWikiversions.php: https://gerrit.wikimedia.org/r/#/c/232672/ (duration: 00m 12s)
* 22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host miscweb1003.eqiad.wmnet
* 18:50 logmsgbot: krinkle@tin Synchronized php-1.26wmf20/maintenance/deleteEqualMessages.php: (no message) (duration: 00m 11s)
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb2003.codfw.wmnet on all recursors
* 18:50 logmsgbot: krinkle@tin Synchronized php-1.26wmf19/maintenance/deleteEqualMessages.php: (no message) (duration: 00m 13s)
* 22:32 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache miscweb2003.codfw.wmnet on all recursors
* 18:38 twentyafterfour: ^ stupid typo. That sync was group1 to 1.26wmf20
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:37 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: tig
* 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
* 18:31 logmsgbot: ori@tin Synchronized w/404.php: Ided1facc0: Remove auto-redirection from 404 page. (duration: 00m 13s)
* 22:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
* 17:51 ejegg: updated SmashPig from 258f2c917b1ae50b01231927bcd6f58ecaa8940b to fdb053efa617162ac9f695e493c390987a069140
* 22:29 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 17:30 urandom: bouncing Cassandra on restbase1001 to apply temporary GC setting
* 22:29 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host miscweb2003.codfw.wmnet
* 17:12 andrewbogott: ok, /now/ I’m running a dist-upgrade on labcontrol1001, to sort out weird oslo dependencies
* 22:24 ejegg: civicrm upgraded from {{Gerrit|68fa85cf}} to {{Gerrit|672950d9}}
* 17:09 chasemp: adding firewall to elasticsearch2[4-6] (3 was just done as a pilot)
* 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 17:03 andrewbogott: upgraded labnet1002 nova services to Juno
* 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:34 andrewbogott: stopping keystone, updating db, restarting
* 22:04 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 16:18 andrewbogott: switching labcontrol1001 hiera to Juno which will add the cloud-archive repo for Juno.
* 21:54 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 16:11 andrewbogott: backing up labs openstack databases into /home/andrew/openstackdbbackups on db1009
* 20:47 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 16:11 andrewbogott: starting labs openstack update to Juno
* 20:36 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): blockers hopefully resolved, rolling to all wikis
* 15:53 moritzm: ferm enabled on elastic1023
* 20:35 TheresNoTime: close UTC late backport window
* 15:45 godog: repool restbase1009 in pybal
* 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] (duration: 08m 18s)
* 15:28 logmsgbot: thcipriani@tin Synchronized php-1.26wmf20/extensions/Wikidata: SWAT: Update Wikidata - wrap usage tracking batch updates in transaction [[gerrit:233970]] (duration: 00m 23s)
* 20:28 samtar@deploy2002: samtar and sharvaniharan: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 13:47 andrewbogott: rebooting/reimaging labnet1001
* 20:26 samtar@deploy2002: Started scap: Backport for [[gerrit:900399{{!}}Remove sampling from breadCrumbs schema]]
* 13:11 mobrovac: restbase deploying 1dfba85
* 20:21 brennen@deploy2002: Finished scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] (duration: 09m 06s)
* 12:54 yurik: git synced kartotherian
* 20:14 brennen@deploy2002: brennen and jforrester: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 11:02 jynus: dropping optin_survey_old table on all wikis
* 20:12 brennen@deploy2002: Started scap: Backport for [[gerrit:900427{{!}}Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)]]
* 10:33 godog: reenable puppet on ms-fe/ms-be, base::firewall still not enabled
* 19:28 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a587106]: (no justification provided) (duration: 00m 12s)
* 09:58 godog: test-reboot ms-be2001
* 19:27 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a587106]: (no justification provided)
* 08:17 godog: disable puppet on ms-be/ms-fe in preparation for merging firewall changes
* 18:41 wfan: enable monthlyconvert for cz
* 07:53 logmsgbot: l10nupdate@tin ResourceLoader cache refresh completed at Wed Aug 26 07:53:31 UTC 2015 (duration 53m 30s)
* 18:40 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) (duration: 00m 13s)
* 07:01 jynus: restarting mw1239 HHVM, which is unresponsive
* 18:40 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided)
* 04:47 logmsgbot: ori@tin Synchronized wmf-config: I73721936: Enable ParsoidBatchAPI everywhere (duration: 00m 13s)
* 18:38 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2067.codfw.wmnet
* 03:11 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-26 03:11:29+00:00
* 18:37 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 03:06 logmsgbot: awight@tin Synchronized wmf-config/InitialiseSettings-labs.php: Push labs config to keep in sync with master (duration: 00m 13s)
* 18:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet
* 03:05 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 10m 45s)
* 18:03 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet
* 02:37 logmsgbot: l10nupdate@tin LocalisationUpdate completed (1.26wmf19) at 2015-08-26 02:37:51+00:00
* 17:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
* 02:34 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 29s)
* 17:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
* 02:00 ottomata: kafka topic webrequest_upload has finished rebalancing across new brokers. starting move of last topic webrequest_text
* 17:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 01:50 logmsgbot: mattflaschen@tin Synchronized php-1.26wmf19/extensions/Flow/: Sync Flow for reply fix (duration: 00m 15s)
* 17:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
* 00:28 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/Scribunto/engines/LuaCommon/LuaCommon.php: (no message) (duration: 00m 13s)
* 17:40 ayounsi@cumin2002: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
* 00:26 logmsgbot: ori@tin Synchronized php-1.26wmf19/extensions/Scribunto/engines/LuaCommon/LuaCommon.php: (no message) (duration: 00m 13s)
* 17:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 00:26 Danny_B: 2586dd1c7c obviously broke many pages
* 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
* 00:19 logmsgbot: ori@tin Synchronized php-1.26wmf19/extensions/Scribunto/engines/LuaCommon/LuaCommon.php: (no message) (duration: 00m 14s)
* 17:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
* 00:14 logmsgbot: ori@tin Synchronized wmf-config/CommonSettings.php: I79ffa78fa: Collection/OCG: Turn on plain text output format in Book Creator (duration: 00m 12s)
* 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 00:12 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/Scribunto/engines/LuaCommon/LuaCommon.php: 2586dd1c7c: Updated mediawiki/core Project: mediawiki/extensions/Scribunto (duration: 00m 13s)
* 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s)
* 16:58 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade.
* 16:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
* 16:56 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
* 16:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
* 16:31 Emperor: reboot ms-be2067 again to see if the missing drive comes back
* 16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
* 15:39 claime: Pooled new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]]
* 15:28 sukhe: enable puppet on R:class = dnsrecursor to merge CR: 898957 [done]
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
* 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
* 15:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
* 15:15 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
* 15:15 claime: Pooling new mw hosts mw24[20-51].codfw.wmnet - [[phab:T326363|T326363]]
* 15:13 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
* 15:12 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
* 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
* 15:10 sukhe: disable puppet on R:class = dnsrecursor to merge CR: 898957
* 15:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts
* 15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 32 hosts
* 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 14:49 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 14:44 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:06 urandom: ALTER-ing image_suggestions.suggestion table — [[phab:T328670|T328670]]
* 13:35 kostajh: UTC afternoon deploys done
* 13:34 kharlan@deploy2002: Finished scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] (duration: 07m 44s)
* 13:28 kharlan@deploy2002: kharlan: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:894593{{!}}GrowthExperiments: Remove unused GENewImpactD3Enabled flag]]
* 13:15 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] (duration: 09m 48s)
* 13:07 kharlan@deploy2002: kharlan: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:05 kharlan@deploy2002: Started scap: Backport for [[gerrit:900196{{!}}GrowthExperiments: Enable LevelingUp features on testwiki (T317813)]]
* 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
* 12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
* 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 12:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
* 11:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
* 11:43 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
* 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
* 11:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
* 11:27 hnowlan@puppetmaster1001: conftool action : set/weight=3; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 32 hosts with reason: new_install
* 11:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 32 hosts with reason: new_install
* 11:10 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
* 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
* 11:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
* 10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
* 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
* 10:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:38 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
* 10:33 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
* 10:32 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
* 10:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
* 10:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:30 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:28 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 to move it to x1', diff saved to https://phabricator.wikimedia.org/P45885 and previous config saved to /var/cache/conftool/dbconfig/20230316-100945-root.json
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1105.eqiad.wmnet
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:49 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
* 08:48 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1105.eqiad.wmnet
* 08:40 kostajh: UTC morning deploys (second round) done
* 08:40 kharlan@deploy2002: Finished scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] (duration: 12m 30s)
* 08:29 kharlan@deploy2002: kharlan: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:27 kharlan@deploy2002: Started scap: Backport for [[gerrit:900126{{!}}SuggestedEditSession: Fix handling of post-save data refresh]], [[gerrit:899605{{!}}Leveling up: always set wgGELevelingUpEnabledForUser (T332227)]]
* 08:11 apergos: additional deployments for the  UTC morning backport and config training window, running into the next hour, so window re-opened
* 07:36 tgr_: UTC morning deploys done
* 07:34 tgr@deploy2002: Finished scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] (duration: 08m 13s)
* 07:28 tgr@deploy2002: tgr: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:26 tgr@deploy2002: Started scap: Backport for [[gerrit:900026{{!}}Leveling up: Backport recent changes]]
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105 from dbctl [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45883 and previous config saved to /var/cache/conftool/dbconfig/20230316-062307-root.json
* 06:03 marostegui: Failover m5 from db1106 to db1176 - [[phab:T332155|T332155]]
* 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]]
* 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T332155|T332155]]
* 03:29 ejegg: payments-wiki upgraded from {{Gerrit|1532b107}} to {{Gerrit|0fd66b1f}}


== 2015-08-25 ==
== 2023-03-15 ==
* 23:39 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233860/ (duration: 00m 12s)
* 22:55 tzatziki: Removing 1 file for legal compliance
* 23:16 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233872/ (duration: 00m 13s)
* 22:30 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 55s)
* 23:13 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/232963/ (duration: 00m 12s)
* 22:29 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]])
* 23:12 logmsgbot: krenair@tin Synchronized wmf-config/extension-list: https://gerrit.wikimedia.org/r/#/c/232963/ (duration: 00m 12s)
* 22:29 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]]) (duration: 00m 28s)
* 23:10 logmsgbot: krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/232962/ (duration: 00m 12s)
* 22:28 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee ([[phab:T331915|T331915]])
* 23:10 logmsgbot: krenair@tin Synchronized wmf-config/extension-list: https://gerrit.wikimedia.org/r/#/c/232962/ (duration: 00m 12s)
* 22:08 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str (duration: 00m 14s)
* 23:05 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233781/ (duration: 00m 12s)
* 22:07 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str
* 22:20 cscott: updated Parsoid to version c3b037b0
* 21:59 brennen: end of phabricator update window ([[phab:T331915|T331915]])
* 22:10 ejegg: disabled paypal audit downloader and parser due to them warning of incorrect data
* 21:47 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 40s)
* 21:16 logmsgbot: ori@tin Synchronized php-1.26wmf19/extensions/AbuseFilter: I15f5b5b6 & I9c23b607 (duration: 00m 13s)
* 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]])
* 21:13 logmsgbot: ori@tin Synchronized php-1.26wmf19/extensions/Cite/modules/ext.cite.styles.css: 7344e02216: Updated mediawiki/core Project: mediawiki/extensions/Cite (duration: 00m 12s)
* 21:46 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]]) (duration: 00m 28s)
* 21:09 logmsgbot: ori@tin Synchronized php-1.26wmf20/extensions/AbuseFilter: I15f5b5b6 & I9c23b607 (duration: 00m 14s)
* 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message ([[phab:T331915|T331915]], [[phab:T155130|T155130]])
* 20:54 tgr: finished OAuth migration
* 21:26 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]]) (duration: 00m 52s)
* 20:34 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings.php: make OAuth DB writable again T108648 (duration: 00m 12s)
* 21:25 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 ([[phab:T331915|T331915]])
* 20:32 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings.php: change wgMWOAuthCentralWiki mediawikiwiki -> metawiki T108648 (duration: 00m 12s)
* 21:19 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893] (duration: 00m 11s)
* 20:24 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings.php: set OAuth to readonly for DB migration T108648 (duration: 00m 13s)
* 21:19 milimetric@deploy2002: Started deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893]
* 20:13 subbu: deployed parsoid version 759916fc
* 21:13 mutante: phab* - upgrading PHP packages
* 19:24 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf20
* 21:13 mutante: phabricator - maintenance window starting - expect possible downtime
* 19:21 logmsgbot: twentyafterfour@tin Finished scap: testwiki to 1.26wmf20 (duration: 50m 12s)
* 21:08 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
* 18:31 logmsgbot: twentyafterfour@tin Started scap: testwiki to 1.26wmf20
* 21:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
* 17:11 YuviPanda: run authdns-update on radon (ns0.wikimedia.org)
* 20:56 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]]) (duration: 00m 31s)
* 17:10 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 20:55 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 ([[phab:T331915|T331915]])
* 16:58 Krinkle: mwscript deleteEqualMessages.php --wiki kawiki
* 20:54 brennen: starting phabricator window a touch early with a test deploy to phab2002
* 16:56 andrewbogott: restarting pdns on labcontrol1001 and labcontrol2001 to handle a nembus reboot
* 20:51 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor (duration: 00m 16s)
* 16:53 Krinkle: mwscript deleteEqualMessages.php --wiki huwiki
* 20:51 ebernhardson@deploy2002: Started deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor
* 16:31 Krinkle: mwscript deleteEqualMessages.php --wiki frwiki
* 20:48 TheresNoTime: close UTC late backport window
* 16:17 Krinkle: mwscript deleteEqualMessages.php --wiki frpwiki
* 20:48 samtar@deploy2002: Finished scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] (duration: 08m 46s)
* 15:50 godog: powercycle ms-be1004, likely xfs
* 20:41 samtar@deploy2002: matmarex and samtar and esanders: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 15:44 andrewbogott: dist-upgrade and rebooting nembus in an attempt to resolve this acpi_pad issue
* 20:39 samtar@deploy2002: Started scap: Backport for [[gerrit:899693{{!}}Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407)]], [[gerrit:899726{{!}}Clean up DiscussionTools config for mediawikiwiki]]
* 15:36 Krinkle: mwscript deleteEqualMessages.php --wiki euwiki (T45917)
* 20:35 samtar@deploy2002: Finished scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] (duration: 10m 30s)
* 15:29 Krinkle: mwscript deleteEqualMessages.php --wiki eowiki (T45917)
* 20:33 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3002.wikimedia.org with OS bullseye
* 15:07 logmsgbot: krenair@tin Synchronized php-1.26wmf19/extensions/Flow: https://gerrit.wikimedia.org/r/#/c/233718/ (duration: 00m 16s)
* 20:27 samtar@deploy2002: samtar and tsepothoabala: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:56 jynus: dropping old tables on s7 - T5493
* 20:25 samtar@deploy2002: Started scap: Backport for [[gerrit:896900{{!}}Deploy action blocks on itwiki (T330533)]]
* 13:48 jynus: dropping old tables on s6 - T54932
* 20:23 samtar@deploy2002: Finished scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] (duration: 10m 12s)
* 12:53 Jeff_Green: authdns-update to change bismuth's IP
* 20:20 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bullseye
* 11:16 jynus: dropping old tables on s3 - T54932
* 20:17 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bullseye
* 10:46 jynus: dropping old tables on s2 - T54932
* 20:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
* 10:05 YuviPanda: restart puppetmaster on labcontrol1001 for https://gerrit.wikimedia.org/r/#/c/233184/
* 20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye
* 07:35 _joe_: stopping redis, wiping aof, restarting redis on rdb100{1,2} - snapshot saved on rdb1002:/root
* 20:15 samtar@deploy2002: sgimeno and samtar: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 07:12 _joe_: stopping redis on rdb1003,4, wiping AOF, restarting
* 20:13 samtar@deploy2002: Started scap: Backport for [[gerrit:899673{{!}}GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363{{!}}GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]]
* 06:38 jynus: performing schema change on officewiki, mediawikiwiki and metawiki
* 20:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
* 02:21 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 26s)
* 20:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 14s)
* 01:48 ottomata: starting move of kafka partitions for topic webrequest_upload to new brokers. this will take a while!
* 20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries
* 01:44 ottomata: restarting kafka on new brokers kafka1013,1014,1020 to apply increase in num.replica.fetchers
* 20:11 taavi: deploy patch for [[phab:T331192|T331192]]
* 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
* 20:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
* 20:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
* 19:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
* 19:54 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3002.wikimedia.org with OS bullseye
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
* 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
* 19:53 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3001.wikimedia.org with OS bullseye
* 19:50 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
* 19:49 taavi@deploy2002: Finished scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] (duration: 12m 04s)
* 19:48 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1002.wikimedia.org with OS bullseye
* 19:47 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
* 19:46 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bullseye
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
* 19:45 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2002.wikimedia.org with OS bullseye
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bullseye
* 19:41 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bullseye
* 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
* 19:39 taavi@deploy2002: taavi: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:37 taavi@deploy2002: Started scap: Backport for [[gerrit:899736{{!}}extdist: Add REL1_40 (T329085)]]
* 19:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
* 19:35 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
* 19:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
* 19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
* 19:32 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye
* 19:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
* 19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
* 19:28 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
* 19:27 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
* 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
* 19:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
* 19:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
* 19:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1001.wikimedia.org with OS bullseye
* 19:16 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2001.wikimedia.org with OS bullseye
* 19:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bullseye
* 19:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3001.wikimedia.org with OS bullseye
* 19:05 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6002.wikimedia.org with OS bullseye
* 19:03 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bullseye
* 18:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
* 18:49 mutante: adding new language prefix anp.wikipedia.org - Angika, an Eastern Indo-Aryan language spoken in some parts of the Indian states of Bihar and Jharkhand, as well as in parts of Nepal. ([[phab:T332115|T332115]])
* 18:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
* 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
* 18:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
* 18:25 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6001.wikimedia.org with OS bullseye
* 18:24 brennen@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]] (duration: 06m 08s)
* 18:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 18:19 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5002.wikimedia.org with OS bullseye
* 18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 18:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 05s)
* 18:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries
* 18:06 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): no current blockers, rolling to group1.
* 18:04 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5001.wikimedia.org with OS bullseye
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
* 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
* 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
* 17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
* 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
* 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.wmnet
* 17:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2006.codfw.wmnet
* 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bullseye
* 17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2006.codfw.wmnet
* 17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2004.codfw.wmnet
* 17:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2004.codfw.wmnet
* 17:29 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
* 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
* 17:25 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
* 17:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
* 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
* 17:12 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5001.wikimedia.org with OS bullseye
* 17:05 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4001.wikimedia.org with OS bullseye
* 16:19 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 16:19 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 16:17 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 16:17 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 16:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye
* 16:02 hnowlan: restarted thumbor-instances on thumbor1006
* 16:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 15:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
* 15:52 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
* 15:49 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
* 15:44 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4002.wikimedia.org with OS bullseye
* 15:34 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye
* 15:33 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 15:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 15:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 15:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 15:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 15:01 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
* 14:54 Emperor: depool moss-fe1001 as rate of token denial is too high
* 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
* 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
* 14:53 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
* 14:53 claime: Redeploying mw-on-k8s for php7.4 update [[phab:T330270|T330270]]
* 14:52 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 14:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:46 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
* 14:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:41 cgoubert@deploy2002: Started scap: (no justification provided)
* 14:41 claime: Rebuilding mw-on-k8s images - [[phab:T330270|T330270]]
* 14:38 claime: Updating php7.4 production images
* 14:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
* 14:34 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
* 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
* 14:24 daniel@deploy2002: Finished scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] (duration: 09m 57s)
* 14:22 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 14:22 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 14:22 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=pki
* 14:22 jbond: switch pki to be active active
* 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 14:20 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 14:19 jbond: update pki to use discovery record
* 14:16 jbond@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=pki
* 14:15 daniel@deploy2002: daniel: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 14:14 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4002.wikimedia.org with OS bullseye
* 14:14 daniel@deploy2002: Started scap: Backport for [[gerrit:898795{{!}}Always write parsoid output to parser cache. (T320534)]]
* 14:12 sukhe: [correction] depool _doh4002_ for reimaging to bullseye: [[phab:T321309|T321309]]
* 14:12 sukhe: depool dns4002 for reimaging to bullseye: [[phab:T321309|T321309]]
* 14:00 moritzm: nodejs security updates on buster
* 13:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS bullseye
* 13:50 sukhe: reprepro -C component/pdns-recursor include bullseye-wikimedia pdns-recursor_4.6.2-1+wmf11u1_amd64.changes: [[phab:T321309|T321309]]
* 13:49 moritzm: installing graphite-web security updates
* 13:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
* 13:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 13:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:28 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 13:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
* 13:27 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 13:27 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 13:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
* 13:26 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:25 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 13:24 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 13:22 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 13:21 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:20 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
* 13:17 taavi@deploy2002: Finished scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] (duration: 09m 01s)
* 13:12 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS bullseye
* 13:10 taavi@deploy2002: matmarex and taavi and esanders: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebu
* 13:08 taavi@deploy2002: Started scap: Backport for [[gerrit:898843{{!}}Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313)]], [[gerrit:898844{{!}}Enable DiscussionTools usability improvements at cswiki, huwiki (T329407)]], [[gerrit:897912{{!}}Disable visual enhancements on newsectionlink pages initially (T331635)]]
* 13:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 13:07 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 12:18 marostegui: Failover m5 from db1176 to db1106 - [[phab:T331877|T331877]]
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]]
* 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch [[phab:T331877|T331877]]
* 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 11:36 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
* 11:34 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
* 11:32 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
* 11:30 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
* 11:27 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 11:26 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
* 11:20 moritzm: imported packages into thirdparty/ceph-quincy
* 11:16 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
* 11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
* 11:14 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
* 11:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
* 11:00 claime: Redirecting test.wikidata.org to mw-on-k8s - [[phab:T331268|T331268]]/25
* 10:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 10:29 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 10:28 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 10:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 10:25 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
* 10:24 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
* 10:23 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 10:22 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 10:22 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:21 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:20 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:19 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 10:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
* 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
* 10:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
* 10:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
* 10:08 jayme@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
* 10:08 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
* 09:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
* 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
* 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
* 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
* 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
* 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
* 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 09:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
* 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
* 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
* 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
* 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
* 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
* 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
* 09:49 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
* 09:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
* 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
* 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
* 09:45 jayme@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
* 09:39 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
* 09:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 09:26 moritzm: rolling restart of FPM/Apache to pick up gnutls28 security updates
* 09:22 moritzm: installing gnutls28 security updates
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl [[phab:T331875|T331875]]', diff saved to https://phabricator.wikimedia.org/P45872 and previous config saved to /var/cache/conftool/dbconfig/20230315-090515-root.json
* 08:40 hashar@deploy2002: Finished deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]] (duration: 00m 19s)
* 08:40 hashar@deploy2002: Started deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - [[phab:T222199|T222199]]
* 08:15 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 08:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
* 07:40 tgr_: UTC morning deploys done
* 07:39 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2067.codfw.wmnet
* 07:36 tgr@deploy2002: Finished scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] (duration: 07m 54s)
* 07:30 tgr@deploy2002: tgr: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:28 tgr@deploy2002: Started scap: Backport for [[gerrit:898869{{!}}LevelingUpManager: Ensure that $suggestions is a TaskSet]]
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) [[phab:T331874|T331874]]', diff saved to https://phabricator.wikimedia.org/P45870 and previous config saved to /var/cache/conftool/dbconfig/20230315-062643-root.json
* 06:20 marostegui: Remove pki2001 from m1 grants [[phab:T332018|T332018]]


== 2015-08-24 ==
== 2023-03-14 ==
* 23:46 logmsgbot: mattflaschen@tin Synchronized wmf-config: Remove wgFlowOccupyPages (duration: 00m 12s)
* 23:29 brennen@deploy2002: Finished scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] (duration: 10m 32s)
* 23:38 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/233636/ (duration: 00m 12s)
* 23:20 brennen@deploy2002: brennen and umherirrender: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:16 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings-labs.php: change OAuth DB on beta +enable writes (duration: 00m 12s)
* 23:19 brennen@deploy2002: Started scap: Backport for [[gerrit:898867{{!}}action: Restrict action.delete.js to action=delete pages (T330205)]]
* 21:55 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings-labs.php: set beta OAuth to readonly (duration: 00m 13s)
* 22:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 21:54 logmsgbot: tgr@tin Synchronized wmf-config/CommonSettings-labs.php: set beta OAuth to readonly (duration: 00m 13s)
* 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 21:42 akosiaris: enabled puppet on maps-test200{1,2,3,4}.codfw.wmnet
* 22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 20:21 arlolra: updated Parsoid to version 0b2fbae7
* 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 18:58 bblack: reloading primary LVS pybals for BlankPage change ( https://gerrit.wikimedia.org/r/#/c/233053/ ) + ulimit fixup ( https://gerrit.wikimedia.org/r/#/c/233484/ )
* 22:08 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 18:31 bblack: reloading backup LVS pybals for BlankPage change ( https://gerrit.wikimedia.org/r/#/c/233053/ )
* 21:38 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 17:19 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 21:38 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 16:23 logmsgbot: bd808@tin Purged l10n cache for 1.26wmf18
* 21:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 16:23 logmsgbot: bd808@tin Purged l10n cache for 1.26wmf17
* 21:17 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 16:05 andrewbogott: rebooting labnet1001
* 21:16 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 15:53 _joe_: restarted nutcracker on mw1010, holding a 150 GB deleted logfile
* 21:11 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 15:47 Krenair: running sync-common on mw1010 to bring it up to date after clearing some space
* 21:11 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 15:44 logmsgbot: krenair@tin Purged l10n cache for 1.26wmf16
* 21:11 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 15:41 logmsgbot: krenair@tin Purged l10n cache for 1.26wmf15
* 20:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 15:38 logmsgbot: krenair@tin Synchronized php-1.26wmf19/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/233411/1 (duration: 00m 49s)
* 20:47 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 15:37 hashar: stopped and restarted Zuul
* 20:43 ejegg: payments-wiki upgraded from {{Gerrit|61c30a4f}} to {{Gerrit|1532b107}}
* 15:31 logmsgbot: krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/232919/ and https://gerrit.wikimedia.org/r/#/c/232915/ (duration: 01m 34s)
* 20:35 zabe@deploy2002: Finished scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] (duration: 08m 36s)
* 15:29 logmsgbot: krenair@tin Synchronized w/static/images/project-logos/knwikiquote.png: https://gerrit.wikimedia.org/r/#/c/232919/ (duration: 02m 04s)
* 20:28 zabe@deploy2002: zabe: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:19 Krenair: No space left on mw1010, cannot ping or ssh to mw2180
* 20:27 zabe@deploy2002: Started scap: Backport for [[gerrit:897997{{!}}dewiki: Allow 'crats to remove sysopship and manage importers (T331921)]]
* 15:16 logmsgbot: krenair@tin Synchronized docroot/noc/db.php: https://gerrit.wikimedia.org/r/#/c/232920/ (duration: 01m 34s)
* 20:04 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 15:14 hashar: apt-get upgrade on gallium
* 20:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 14:48 andrewbogott: forcing wikitech logouts in order to flush everyone’s service catalog
* 19:47 topranks: Reboot cloudsw1-b1-codfw to upgrade JunOS version [[phab:T327919|T327919]]
* 14:18 ottomata: starting to move kafka topic-partitions to new brokers (and off of analytics1021)
* 19:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
* 14:12 yurik: git deploy synced kartotherian
* 19:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
* 13:55 akosiaris: disable puppet on fermium preparing for reinstallation
* 19:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:55 akosiaris: disable puppet on fermium
* 19:30 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): uneventful at group0.  i'm afk for about an hour.
* 12:54 akosiaris: stop etcd on etcd1002.eqiad.wmnet. Already removed from the cluster
* 19:13 ejegg: civicrm upgraded from {{Gerrit|dbe3b716}} to {{Gerrit|68fa85cf}}
* 11:58 _joe_: stopping etcd on etcd1001
* 18:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS bullseye
* 11:50 _joe_: restarting etcd on etcd1001
* 18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
* 09:00 YuviPanda: starting up replicate for tools on labstore1002
* 18:28 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s)
* 09:00 YuviPanda: cleaning up lockdir on labstore for maps and tools
* 18:27 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 09:00 YuviPanda: others replication on labstore1002 completed successfuly
* 18:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
* 08:31 YuviPanda: cleaned up others lockdir for replication on labstore1002 and started it manually
* 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 06:43 jynus: reloading dbproxy1003 service
* 18:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 02:21 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 36s)
* 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 18:22 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 30s)
* 18:22 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
* 18:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 18:13 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 18:13 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS bullseye
* 18:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
* 18:06 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
* 18:03 brennen: 1.40.0-wmf.27 train ([[phab:T330205|T330205]]): no current blockers, rolling to group0.
* 17:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 17:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 17:58 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:56 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:56 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:55 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:53 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 17:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 17:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 16:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
* 16:47 sukhe: rolling restart of pdns-rec in A:wikidough to pick up config changes
* 16:47 sukhe: rolling restart of pdns-rec to pick up config changes
* 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pki2001.codfw.wmnet
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
* 16:13 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
* 16:11 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 16:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
* 16:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
* 16:00 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts pki2001.codfw.wmnet
* 15:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS bullseye
* 15:36 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
* 15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
* 15:32 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
* 15:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
* 15:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
* 15:19 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS bullseye
* 15:00 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:59 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
* 14:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
* 14:54 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 14:53 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 14:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 14:52 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 14:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 14:51 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:42 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:37 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1001.eqiad.wmnet with OS bullseye
* 14:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
* 14:16 claime: All active/active services in eqiad repooled, DNS issues resolved - [[phab:T331541|T331541]]
* 14:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2122 weight', diff saved to https://phabricator.wikimedia.org/P45866 and previous config saved to /var/cache/conftool/dbconfig/20230314-140926-root.json
* 14:01 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki1001.eqiad.wmnet with OS bullseye
* 14:00 jbond: reimage pki1001
* 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
* 13:33 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results (again, with sukhe's more-correct variant!)
* 13:27 TheresNoTime: close UTC afternoon backport window
* 13:26 samtar@deploy2002: Finished scap: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]] (duration: 07m 24s)
* 13:20 samtar@deploy2002: samtar and urbanecm: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:19 samtar@deploy2002: Started scap: Backport for [[gerrit:898700{{!}}arwiki: Add new throttle rule (T331973)]]
* 13:18 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results
* 13:18 samtar@deploy2002: Finished scap: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]] (duration: 07m 55s)
* 13:11 samtar@deploy2002: esanders and samtar: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 13:10 samtar@deploy2002: Started scap: Backport for [[gerrit:894094{{!}}Enable VE on more namespaces on foundationwiki (T331079)]]
* 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
* 13:04 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 13:02 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 12:44 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 12:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45864 and previous config saved to /var/cache/conftool/dbconfig/20230314-123515-marostegui.json
* 12:23 moritzm: installing git security updates
* 12:20 samtar@deploy2002: Finished scap: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]] (duration: 09m 12s)
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45863 and previous config saved to /var/cache/conftool/dbconfig/20230314-122009-marostegui.json
* 12:20 TheresNoTime: `Command '['helmfile', '-e', 'eqiad', '--selector', 'name=canary', 'apply']' returned non-zero exit status 1.` (P45862) during scap deployment of [[phab:T297396|T297396]] + [[phab:T331680|T331680]] — scap rolled back
* 12:18 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host pki-root1001.eqiad.wmnet with OS bullseye
* 12:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 12:13 samtar@deploy2002: samtar and varnent: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 12:11 samtar@deploy2002: Started scap: Backport for [[gerrit:896224{{!}}[foundationwiki] Grant translation admin rights to 'editor' group (T297396)]], [[gerrit:896216{{!}}docroot: Update privacy policy footer link (T331680)]]
* 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
* 12:08 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
* 12:08 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 12:06 claime: Unlocked scap deployments - [[phab:T331541|T331541]]
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45861 and previous config saved to /var/cache/conftool/dbconfig/20230314-120503-marostegui.json
* 12:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 12:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
* 11:51 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
* 11:51 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool appservers-ro in eqiad: [[phab:T331541|T331541]]
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45860 and previous config saved to /var/cache/conftool/dbconfig/20230314-114957-marostegui.json
* 11:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 11:41 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 11:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 11:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 11:27 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 11:27 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45857 and previous config saved to /var/cache/conftool/dbconfig/20230314-112354-marostegui.json
* 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45856 and previous config saved to /var/cache/conftool/dbconfig/20230314-112333-marostegui.json
* 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
* 11:19 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
* 11:13 claime: We are encountering unexpected DNS anycast issued following [[phab:T331541|T331541]], latencies are increased but no production outage.
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45855 and previous config saved to /var/cache/conftool/dbconfig/20230314-110826-marostegui.json
* 11:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
* 11:03 akosiaris@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
* 11:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
* 10:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45854 and previous config saved to /var/cache/conftool/dbconfig/20230314-105319-marostegui.json
* 10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: [[phab:T331541|T331541]]
* 10:48 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: [[phab:T331541|T331541]]
* 10:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - [[phab:T331541|T331541]]
* 10:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki-root1001.eqiad.wmnet with OS bullseye
* 10:42 jbond: reimage pki-root1001
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45853 and previous config saved to /var/cache/conftool/dbconfig/20230314-103813-marostegui.json
* 10:33 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - [[phab:T331541|T331541]]
* 10:32 claime: Repooling all active/active services in eqiad - [[phab:T331541|T331541]]
* 10:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
* 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
* 10:28 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
* 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
* 10:28 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=99)
* 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
* 10:28 claime: Running sre.switchdc.mediawiki.00-optional-warmup-caches - [[phab:T331541|T331541]]
* 10:21 jbond: move pki.discovery.wmnet to pki2002 (buyllseye)
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45852 and previous config saved to /var/cache/conftool/dbconfig/20230314-101918-marostegui.json
* 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45851 and previous config saved to /var/cache/conftool/dbconfig/20230314-101840-marostegui.json
* 10:15 jayme: enabling puppet on P:calico::kubernetes for [[phab:T325268|T325268]]
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45850 and previous config saved to /var/cache/conftool/dbconfig/20230314-100334-marostegui.json
* 10:02 claime: Locking scap deployment for service switchover - [[phab:T331541|T331541]]
* 10:00 claime: Locking scap deployment for service switchover - [[phab:T330651|T330651]]
* 09:56 jayme: disabling puppet on P:calico::kubernetes for [[phab:T325268|T325268]]
* 09:54 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 09:53 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 09:51 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45849 and previous config saved to /var/cache/conftool/dbconfig/20230314-094828-marostegui.json
* 09:42 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:36 moritzm: installing NSS security updates
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45848 and previous config saved to /var/cache/conftool/dbconfig/20230314-093321-marostegui.json
* 09:32 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:23 Emperor: reboot ms-be2040 [[phab:T331860|T331860]]
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45847 and previous config saved to /var/cache/conftool/dbconfig/20230314-090649-marostegui.json
* 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 09:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45846 and previous config saved to /var/cache/conftool/dbconfig/20230314-084249-marostegui.json
* 08:38 vgutierrez: test HAProxy 2.6.10 in cp4044 and cp4045
* 08:31 vgutierrez: fetch haproxy 2.6.10 for thirdparty/haproxy26 (buster && bullseye) @ apt.wm.o
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45845 and previous config saved to /var/cache/conftool/dbconfig/20230314-082743-marostegui.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45843 and previous config saved to /var/cache/conftool/dbconfig/20230314-081236-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45842 and previous config saved to /var/cache/conftool/dbconfig/20230314-075730-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45841 and previous config saved to /var/cache/conftool/dbconfig/20230314-073210-marostegui.json
* 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45840 and previous config saved to /var/cache/conftool/dbconfig/20230314-073149-marostegui.json
* 07:26 marostegui: Migrate db1183 to mariadb m5 eqiad dbmaint 10.6 [[phab:T322294|T322294]]
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45839 and previous config saved to /var/cache/conftool/dbconfig/20230314-071643-marostegui.json
* 07:13 marostegui: Migrate db2135 to mariadb m5 codfw dbmaint 10.6
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45838 and previous config saved to /var/cache/conftool/dbconfig/20230314-070137-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45837 and previous config saved to /var/cache/conftool/dbconfig/20230314-064630-marostegui.json
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts centrallog1001
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 06:41 hashar: gerrit: changed `operations/puppet` merge strategy to allow "content merges" (see `ops` list for the rationale)
* 06:36 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
* 06:34 denisse@cumin1001: START - Cookbook sre.dns.netbox
* 06:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts centrallog1001
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45836 and previous config saved to /var/cache/conftool/dbconfig/20230314-061633-marostegui.json
* 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 06:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
* 05:07 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 05:05 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@61ef435]: 0.3.122 (duration: 08m 45s)
* 04:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.122` on canary `wdqs1003`; proceeding to rest of fleet
* 04:56 ryankemper@deploy2002: Started deploy [wdqs/wdqs@61ef435]: 0.3.122
* 04:56 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.122`. Pre-deploy tests passing on canary `wdqs1003`
* 03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.25 (duration: 02m 20s)
* 03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]] (duration: 51m 02s)
* 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.27  refs [[phab:T330205|T330205]]
* 02:22 legoktm: removed user's 2FA on wikitech for [[phab:T331955|T331955]]
* 02:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45835 and previous config saved to /var/cache/conftool/dbconfig/20230314-022023-marostegui.json
* 02:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45834 and previous config saved to /var/cache/conftool/dbconfig/20230314-020517-marostegui.json
* 01:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45833 and previous config saved to /var/cache/conftool/dbconfig/20230314-015011-marostegui.json
* 01:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45832 and previous config saved to /var/cache/conftool/dbconfig/20230314-013504-marostegui.json
* 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45831 and previous config saved to /var/cache/conftool/dbconfig/20230314-012442-marostegui.json
* 01:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 01:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45830 and previous config saved to /var/cache/conftool/dbconfig/20230314-012421-marostegui.json
* 01:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45829 and previous config saved to /var/cache/conftool/dbconfig/20230314-010915-marostegui.json
* 00:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45828 and previous config saved to /var/cache/conftool/dbconfig/20230314-005409-marostegui.json
* 00:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45827 and previous config saved to /var/cache/conftool/dbconfig/20230314-003903-marostegui.json
* 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45826 and previous config saved to /var/cache/conftool/dbconfig/20230314-002840-marostegui.json
* 00:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 00:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
* 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45825 and previous config saved to /var/cache/conftool/dbconfig/20230314-002819-marostegui.json
* 00:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45824 and previous config saved to /var/cache/conftool/dbconfig/20230314-001313-marostegui.json


== 2015-08-23 ==
== 2023-03-13 ==
* 16:54 urandom: bouncing Cassandra on restbase1001 to apply temporary GC settings
* 23:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45823 and previous config saved to /var/cache/conftool/dbconfig/20230313-235807-marostegui.json
* 02:20 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 23s)
* 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45822 and previous config saved to /var/cache/conftool/dbconfig/20230313-234301-marostegui.json
* 23:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 23:33 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 23:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45821 and previous config saved to /var/cache/conftool/dbconfig/20230313-233127-marostegui.json
* 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
* 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
* 23:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45820 and previous config saved to /var/cache/conftool/dbconfig/20230313-233050-marostegui.json
* 23:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45819 and previous config saved to /var/cache/conftool/dbconfig/20230313-231544-marostegui.json
* 23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45818 and previous config saved to /var/cache/conftool/dbconfig/20230313-230038-marostegui.json
* 22:48 zabe@deploy2002: Finished scap: [[gerrit:898037{{!}}noc: Switch default selection on db.php from eqiad to codfw]] (duration: 06m 56s)
* 22:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45817 and previous config saved to /var/cache/conftool/dbconfig/20230313-224532-marostegui.json
* 22:41 zabe@deploy2002: Started scap: [[gerrit:898037{{!}}noc: Switch default selection on db.php from eqiad to codfw]]
* 22:40 zabe@deploy2002: scap failed: BrokenPipeError [Errno 32] Broken pipe (duration: 00m 00s)
* {{safesubst:SAL entry|1=22:40 zabe@deploy2002: Started scap: [[gerrit:898037}}
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45816 and previous config saved to /var/cache/conftool/dbconfig/20230313-223331-marostegui.json
* 22:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 22:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45815 and previous config saved to /var/cache/conftool/dbconfig/20230313-223309-marostegui.json
* 22:30 sbassett@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Set ext:StopForumSpam to enforce on es.wikiversity (duration: 06m 59s)
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45814 and previous config saved to /var/cache/conftool/dbconfig/20230313-221803-marostegui.json
* 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45813 and previous config saved to /var/cache/conftool/dbconfig/20230313-220257-marostegui.json
* 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45812 and previous config saved to /var/cache/conftool/dbconfig/20230313-214751-marostegui.json
* 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45811 and previous config saved to /var/cache/conftool/dbconfig/20230313-213544-marostegui.json
* 21:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 21:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
* 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45810 and previous config saved to /var/cache/conftool/dbconfig/20230313-213523-marostegui.json
* 21:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS bullseye
* 21:21 wfan: remove -d for jobs-dlocal queue runner
* 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45809 and previous config saved to /var/cache/conftool/dbconfig/20230313-212017-marostegui.json
* 21:06 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45808 and previous config saved to /var/cache/conftool/dbconfig/20230313-210510-marostegui.json
* 21:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
* 21:01 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
* 21:01 ejegg: enabled jobs-dlocal queue runner
* 21:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45807 and previous config saved to /var/cache/conftool/dbconfig/20230313-205004-marostegui.json
* 20:47 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS bullseye
* 20:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein (duration: 00m 14s)
* 20:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein
* 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45806 and previous config saved to /var/cache/conftool/dbconfig/20230313-203824-marostegui.json
* 20:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 20:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
* 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45805 and previous config saved to /var/cache/conftool/dbconfig/20230313-203802-marostegui.json
* 20:27 kindrobot: close UTC late backport window
* 20:26 kindrobot@deploy2002: Finished scap: Backport for [[gerrit:894765{{!}}Add header at top of main page (T325362)]] (duration: 12m 11s)
* 20:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45804 and previous config saved to /var/cache/conftool/dbconfig/20230313-202256-marostegui.json
* 20:16 kindrobot@deploy2002: kindrobot and ksarabia: Backport for [[gerrit:894765{{!}}Add header at top of main page (T325362)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:15 kindrobot: start UTC late backport window
* 20:14 kindrobot@deploy2002: Started scap: Backport for [[gerrit:894765{{!}}Add header at top of main page (T325362)]]
* 20:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45803 and previous config saved to /var/cache/conftool/dbconfig/20230313-200750-marostegui.json
* 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45802 and previous config saved to /var/cache/conftool/dbconfig/20230313-195244-marostegui.json
* 19:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 19:51 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
* 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45801 and previous config saved to /var/cache/conftool/dbconfig/20230313-194148-marostegui.json
* 19:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 19:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
* 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45800 and previous config saved to /var/cache/conftool/dbconfig/20230313-194116-marostegui.json
* 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 19:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
* 19:38 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
* 19:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
* 19:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45799 and previous config saved to /var/cache/conftool/dbconfig/20230313-192610-marostegui.json
* 19:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45798 and previous config saved to /var/cache/conftool/dbconfig/20230313-191104-marostegui.json
* 19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45797 and previous config saved to /var/cache/conftool/dbconfig/20230313-185558-marostegui.json
* 18:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:48 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:47 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45796 and previous config saved to /var/cache/conftool/dbconfig/20230313-184502-marostegui.json
* 18:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
* 18:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable (duration: 00m 13s)
* 18:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable
* 18:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:36 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date (duration: 00m 14s)
* 18:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 18:36 ebernhardson@deploy2002: Started deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date
* 18:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
* 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45795 and previous config saved to /var/cache/conftool/dbconfig/20230313-183628-marostegui.json
* 18:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45794 and previous config saved to /var/cache/conftool/dbconfig/20230313-182121-marostegui.json
* 18:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 18:11 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45793 and previous config saved to /var/cache/conftool/dbconfig/20230313-180615-marostegui.json
* 17:56 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 17:55 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45792 and previous config saved to /var/cache/conftool/dbconfig/20230313-175109-marostegui.json
* 17:50 dancy@deploy2002: Finished scap: test cleanup (duration: 06m 40s)
* 17:44 dancy@deploy2002: Started scap: test cleanup
* 17:43 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45791 and previous config saved to /var/cache/conftool/dbconfig/20230313-174030-marostegui.json
* 17:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 17:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45790 and previous config saved to /var/cache/conftool/dbconfig/20230313-174009-marostegui.json
* 17:35 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:33 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45789 and previous config saved to /var/cache/conftool/dbconfig/20230313-172503-marostegui.json
* 17:22 dancy@deploy2002: Finished scap: testing [[phab:T329857|T329857]] (duration: 06m 54s)
* 17:16 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 17:15 dancy@deploy2002: Started scap: testing [[phab:T329857|T329857]]
* 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:13 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:12 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:12 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
* 17:11 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:11 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 17:11 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:10 Emperor: roll-restart of codfw eqiad frontends
* 17:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:10 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:10 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45788 and previous config saved to /var/cache/conftool/dbconfig/20230313-170955-marostegui.json
* 17:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:08 dancy@deploy2002: Installation of scap version "4.46.0" completed for 553 hosts
* 17:07 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
* 17:04 bd808: Ran cache.purge_openstack_users() for Striker following deploy of {{Gerrit|e1f7491}} ([[phab:T331674|T331674]])
* 17:04 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45787 and previous config saved to /var/cache/conftool/dbconfig/20230313-165449-marostegui.json
* 16:47 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45785 and previous config saved to /var/cache/conftool/dbconfig/20230313-164410-marostegui.json
* 16:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 16:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45784 and previous config saved to /var/cache/conftool/dbconfig/20230313-164349-marostegui.json
* 16:36 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45783 and previous config saved to /var/cache/conftool/dbconfig/20230313-162843-marostegui.json
* 16:20 moritzm: imported tideways 5.0.4-2+wmf1+buster1+icu67u1  [[phab:T329491|T329491]]
* 16:18 dancy@deploy2002: Finished scap: testing (duration: 06m 53s)
* 16:17 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 16:17 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 16:17 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 16:16 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 16:16 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 16:16 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45782 and previous config saved to /var/cache/conftool/dbconfig/20230313-161337-marostegui.json
* 16:11 dancy@deploy2002: Started scap: testing
* 16:06 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: [[gerrit:897915{{!}} Bumping portals to master (T128546)]] (duration: 06m 15s)
* 16:00 moritzm: imported xdebug 3.0.3+2.9.8+2.8.1+2.5.5-0+deb11u1+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 16:00 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:897915{{!}} Bumping portals to master (T128546)]] (duration: 06m 43s)
* 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45781 and previous config saved to /var/cache/conftool/dbconfig/20230313-155830-marostegui.json
* 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45780 and previous config saved to /var/cache/conftool/dbconfig/20230313-154641-marostegui.json
* 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 15:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 15:35 moritzm: imported php-yaml 2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1+icu67u1 [[phab:T329491|T329491]]
* 15:31 dancy@deploy2002: Finished scap: testing [[phab:T329857|T329857]] (duration: 10m 08s)
* 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 15:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
* 15:21 dancy@deploy2002: Started scap: testing [[phab:T329857|T329857]]
* 15:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
* 15:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
* 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45779 and previous config saved to /var/cache/conftool/dbconfig/20230313-150523-marostegui.json
* 15:03 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
* 14:53 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
* 14:51 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
* 14:51 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
* 14:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45778 and previous config saved to /var/cache/conftool/dbconfig/20230313-145016-marostegui.json
* 14:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
* 14:38 jbond: disable puppet fleet wide to debug strange issue
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45777 and previous config saved to /var/cache/conftool/dbconfig/20230313-143510-marostegui.json
* 14:23 claime: switch noc.wikimedia.org from eqiad to codfw - [[phab:T331634|T331634]]
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45776 and previous config saved to /var/cache/conftool/dbconfig/20230313-142004-marostegui.json
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2182 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45774 and previous config saved to /var/cache/conftool/dbconfig/20230313-141409-marostegui.json
* 14:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 14:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45773 and previous config saved to /var/cache/conftool/dbconfig/20230313-141348-marostegui.json
* 14:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45772 and previous config saved to /var/cache/conftool/dbconfig/20230313-135842-marostegui.json
* 13:50 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
* 13:49 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
* 13:48 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
* 13:48 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6] (duration: 00m 11s)
* 13:48 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6]
* 13:47 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
* 13:46 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
* 13:45 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
* 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45770 and previous config saved to /var/cache/conftool/dbconfig/20230313-134336-marostegui.json
* 13:40 moritzm: imported wikidiff2 1.13.0-1+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45769 and previous config saved to /var/cache/conftool/dbconfig/20230313-132829-marostegui.json
* 13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1 [[phab:T329491|T329491]]
* 13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1T329491
* 13:23 taavi@deploy2002: Finished scap: Backport for [[gerrit:897195{{!}}[trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399)]], [[gerrit:897837{{!}}[trwiki] Removing the temporary logo, previously added, and already reverted (T329047)]] (duration: 08m 10s)
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45768 and previous config saved to /var/cache/conftool/dbconfig/20230313-132123-marostegui.json
* 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45767 and previous config saved to /var/cache/conftool/dbconfig/20230313-132101-marostegui.json
* 13:16 taavi@deploy2002: taavi and superpes: Backport for [[gerrit:897195{{!}}[trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399)]], [[gerrit:897837{{!}}[trwiki] Removing the temporary logo, previously added, and already reverted (T329047)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:15 taavi@deploy2002: Started scap: Backport for [[gerrit:897195{{!}}[trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399)]], [[gerrit:897837{{!}}[trwiki] Removing the temporary logo, previously added, and already reverted (T329047)]]
* 13:13 taavi@deploy2002: Finished scap: Backport for [[gerrit:896321{{!}}zhwiki: Add movefile to extendedconfirmed (T331691)]] (duration: 09m 29s)
* 13:11 moritzm: imported php-luasandbox 4.0.2-3+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45766 and previous config saved to /var/cache/conftool/dbconfig/20230313-130555-marostegui.json
* 13:05 taavi@deploy2002: stang and taavi: Backport for [[gerrit:896321{{!}}zhwiki: Add movefile to extendedconfirmed (T331691)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 13:03 taavi@deploy2002: Started scap: Backport for [[gerrit:896321{{!}}zhwiki: Add movefile to extendedconfirmed (T331691)]]
* 13:00 moritzm: imported php-wmerrors 2.0.0~git20190628.183ef7d-3+wmf1+buster1+icu67u1  [[phab:T329491|T329491]]
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45764 and previous config saved to /var/cache/conftool/dbconfig/20230313-125049-marostegui.json
* 12:48 hnowlan: restarting codfw thumbor instances to attempt to remedy 502 issues
* 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.codfw.wmnet
* 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.codfw.wmnet
* 12:37 moritzm: imported php-geoip 1.1.1-7+wmf2+buster1+icu67u1  [[phab:T329491|T329491]]
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45763 and previous config saved to /var/cache/conftool/dbconfig/20230313-123543-marostegui.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45762 and previous config saved to /var/cache/conftool/dbconfig/20230313-122928-marostegui.json
* 12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 12:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45761 and previous config saved to /var/cache/conftool/dbconfig/20230313-122906-marostegui.json
* 12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 12:19 moritzm: imported php-redis 5.3.2+4.3.0-2+deb11u1+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45760 and previous config saved to /var/cache/conftool/dbconfig/20230313-121400-marostegui.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45759 and previous config saved to /var/cache/conftool/dbconfig/20230313-115854-marostegui.json
* 11:58 moritzm: imported php-memcached 3.1.5+2.2.0-5+deb11u1+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 11:46 moritzm: imported php-igbinary 3.2.1+2.0.8-2+wmf1+buster1+icu67u1  [[phab:T329491|T329491]]
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45758 and previous config saved to /var/cache/conftool/dbconfig/20230313-114348-marostegui.json
* 11:31 moritzm: imported php-apcu 5.1.19+4.0.11-3+wmf2+buster1+icu67u1 [[phab:T329491|T329491]]
* 11:22 jnuche@deploy2002: Installation of scap version "latest" completed for 553 hosts
* 11:21 jnuche@deploy2002: Installing scap version "latest" for 553 hosts
* 11:11 moritzm: imported php-msgpack 2.1.2+0.5.7-2+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 10:55 moritzm: imported php-imagick 3.4.4+php8.0+3.4.4-2+deb11u2+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2159 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45757 and previous config saved to /var/cache/conftool/dbconfig/20230313-104322-marostegui.json
* 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
* 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45756 and previous config saved to /var/cache/conftool/dbconfig/20230313-104246-marostegui.json
* 10:38 moritzm: imported php-pcov 1.0.6-4+wmf1~buster1+icu67u1 [[phab:T329491|T329491]]
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45755 and previous config saved to /var/cache/conftool/dbconfig/20230313-102740-marostegui.json
* 10:26 moritzm: imported php-defaults 7.4+76+wmf1~buster2+icu67u1 [[phab:T329491|T329491]]
* 10:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55701
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45754 and previous config saved to /var/cache/conftool/dbconfig/20230313-101234-marostegui.json
* 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55701
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38193
* 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38193
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46632
* 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46632
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6663
* 10:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6663
* 10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45558
* 10:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45558
* 10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38082
* 10:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38082
* 10:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 668
* 10:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 668
* 10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 10:02 moritzm: imported dh-php 0.35+wmf1+buster1+icu67u1 [[phab:T329491|T329491]]
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45753 and previous config saved to /var/cache/conftool/dbconfig/20230313-095728-marostegui.json
* 09:55 vgutierrez: Enable haproxy hardening in cp hosts globally - [[phab:T323944|T323944]]
* 09:52 zabe@deploy2002: Finished scap: Backport for [[gerrit:896037{{!}}Drop loading of former extension Renameuser's i18n strings [Re-apply]]] (duration: 07m 40s)
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2150 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45752 and previous config saved to /var/cache/conftool/dbconfig/20230313-095119-marostegui.json
* 09:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 09:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45751 and previous config saved to /var/cache/conftool/dbconfig/20230313-095058-marostegui.json
* 09:48 jayme: pcc-worker1003:~# rm -r /srv/jenkins/puppet-compiler/40076 - / back to 70%
* 09:46 zabe@deploy2002: jforrester and zabe: Backport for [[gerrit:896037{{!}}Drop loading of former extension Renameuser's i18n strings [Re-apply]]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 09:45 jayme: pcc-worker1002:~# rm -r /srv/jenkins/puppet-compiler/40078 - / back to 47% usage
* 09:44 zabe@deploy2002: Started scap: Backport for [[gerrit:896037{{!}}Drop loading of former extension Renameuser's i18n strings [Re-apply]]]
* 09:44 zabe@deploy2002: Finished scap: Backport for [[gerrit:897804{{!}}Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685)]] (duration: 07m 52s)
* 09:40 jayme: pcc-worker1001:~# rm -r /srv/jenkins/puppet-compiler/40079 /srv/jenkins/puppet-compiler/38943 - / back to 68% usage
* 09:38 zabe@deploy2002: zabe: Backport for [[gerrit:897804{{!}}Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 09:36 zabe@deploy2002: Started scap: Backport for [[gerrit:897804{{!}}Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685)]]
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45750 and previous config saved to /var/cache/conftool/dbconfig/20230313-093552-marostegui.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45749 and previous config saved to /var/cache/conftool/dbconfig/20230313-092045-marostegui.json
* 09:16 moritzm: installing python-werkzeug security updates
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45748 and previous config saved to /var/cache/conftool/dbconfig/20230313-090539-marostegui.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2122 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45747 and previous config saved to /var/cache/conftool/dbconfig/20230313-085937-marostegui.json
* 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45746 and previous config saved to /var/cache/conftool/dbconfig/20230313-085916-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45745 and previous config saved to /var/cache/conftool/dbconfig/20230313-084409-marostegui.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45744 and previous config saved to /var/cache/conftool/dbconfig/20230313-082903-marostegui.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45743 and previous config saved to /var/cache/conftool/dbconfig/20230313-081357-marostegui.json
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2121 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45742 and previous config saved to /var/cache/conftool/dbconfig/20230313-080759-marostegui.json
* 08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45741 and previous config saved to /var/cache/conftool/dbconfig/20230313-080738-marostegui.json
* 08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
* 08:02 moritzm: installing curl security updates
* 07:58 zabe@deploy2002: Finished scap: Backport for [[gerrit:897188{{!}}use core Renameuser classes (T27482)]], [[gerrit:897187{{!}}UserRenameHandler: Use core RenameUser classes (T27482)]] (duration: 07m 02s)
* 07:53 zabe@deploy2002: zabe: Backport for [[gerrit:897188{{!}}use core Renameuser classes (T27482)]], [[gerrit:897187{{!}}UserRenameHandler: Use core RenameUser classes (T27482)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45740 and previous config saved to /var/cache/conftool/dbconfig/20230313-075232-marostegui.json
* 07:51 zabe@deploy2002: Started scap: Backport for [[gerrit:897188{{!}}use core Renameuser classes (T27482)]], [[gerrit:897187{{!}}UserRenameHandler: Use core RenameUser classes (T27482)]]
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45739 and previous config saved to /var/cache/conftool/dbconfig/20230313-073725-marostegui.json
* 07:37 marostegui: Remove pagetriage_log from enwiki [[phab:T328309|T328309]]
* 07:32 kartik@deploy2002: Finished scap: Backport for [[gerrit:897404{{!}}testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541)]] (duration: 17m 04s)
* 07:25 kartik@deploy2002: kartik: Backport for [[gerrit:897404{{!}}testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45738 and previous config saved to /var/cache/conftool/dbconfig/20230313-072219-marostegui.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2120 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45737 and previous config saved to /var/cache/conftool/dbconfig/20230313-071522-marostegui.json
* 07:15 kartik@deploy2002: Started scap: Backport for [[gerrit:897404{{!}}testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541)]]
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45736 and previous config saved to /var/cache/conftool/dbconfig/20230313-071501-marostegui.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45735 and previous config saved to /var/cache/conftool/dbconfig/20230313-065954-marostegui.json
* 06:52 marostegui_: Remove pagetriage_log from testwiki and test2wiki [[phab:T328309|T328309]]
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45734 and previous config saved to /var/cache/conftool/dbconfig/20230313-064448-marostegui.json
* 06:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9873
* 06:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9873
* 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9507
* 06:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9507
* 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15830
* 06:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15830
* 06:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9902
* 06:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9902
* 06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45733 and previous config saved to /var/cache/conftool/dbconfig/20230313-062942-marostegui.json
* 06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
* 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34549
* 06:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34549
* 06:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 29357
* 06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 29357
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2108 ([[phab:T329260|T329260]])', diff saved to https://phabricator.wikimedia.org/P45732 and previous config saved to /var/cache/conftool/dbconfig/20230313-062244-marostegui.json
* 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
* 06:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138886
* 06:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138886
* 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
* 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
* 06:16 marostegui_: Deploy schema change on s3 codfw dbmaint [[phab:T329684|T329684]]
* 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
* 04:37 kart_: Updated cxserver to 2023-03-09-061555-production ([[phab:T331097|T331097]], [[phab:T327102|T327102]], [[phab:T326541|T326541]])
* 04:19 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 04:19 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 04:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 04:17 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 04:12 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 04:12 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply


== 2015-08-22 ==
== 2023-03-12 ==
* 23:08 logmsgbot: krenair@tin Synchronized php-1.26wmf19/extensions/AbuseFilter/maintenance/addMissingLoggingEntries.php: (no message) (duration: 01m 05s)
* 10:47 elukey: reset offsets on kafka jumbo for benthos webrequest live (as indicated in https://phabricator.wikimedia.org/T331801#8685569)
* 19:41 YuviPanda: manually remove old snapshots from labstore1002
* 07:50 elukey: restart benthos-webrequest-live on centrallog1002 - [[phab:T331801|T331801]]
* 17:28 chasemp: tweaking apache on iridum T109941
* 07:49 elukey: restart benthos-webrequest-live on centrallog2002 - [[phab:T331801|T331801]]
* 16:45 chasemp: scratch that as we have mpm_prefork enabled :)
* 07:49 elukey: stop and mask benthos-webrequest-live on centrallog1001 - [[phab:T331801|T331801]]
* 16:33 chasemp: raising values in mpm_worker.conf for iridium to to debug and hopefully head off further crashing
* 14:44 twentyafterfour: restarted apache2 on iridium. Segfault again. This time I at least got one clue in the log:  "zend_mm_heap corrupted"
* 09:18 twentyafterfour: phabricator seems stable now, restarting apache2 on iridium did the trick, unfortunately we didn't learn why
* 08:36 twentyafterfour: restarted phd on iridium
* 08:36 twentyafterfour: restarted apache2 on iridium
* 02:20 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 06m 09s)
* 00:26 mutante: deleting blog.sh and blog_pageviews crontab from stat1003


== 2015-08-21 ==
== 2023-03-10 ==
* 23:34 urandom: restarting Cassandra on restbase1001 to restore baseline settings
* 22:43 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 23:11 yurik: synced kartotherian
* 22:32 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 22:35 mutante: deleting held messages on mailman that are older than 1 year
* 22:26 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 21:56 awight: increasing paymentswiki orphan gc-cc-limbo expiry time to 30 days
* 22:16 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 21:45 mutante: had to reset list creator password for mailman - ask me if you think you should have it and don't (this is not the master pass)
* 21:24 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 20:37 logmsgbot: ori@tin Synchronized php-1.26wmf19/includes: I1eb8dfc: Revert Count API and hook calls, with 1:1000 sampling (duration: 01m 09s)
* 21:14 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 19:43 awight: update paymentswiki from 2b08853c977eee0fd17bf00a673a3bbf2a146554 to 8ba4b5299f195cf48e6809b18a21e2d53f6eec1b
* 21:13 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 18:58 awight: disabling Amazon gateway
* 21:03 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 18:52 awight: updated paymentswiki from 049ad15323564fd5cd7f5efcadddb532a3590cef to 2b08853c977eee0fd17bf00a673a3bbf2a146554
* 20:43 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78] (duration: 00m 10s)
* 16:06 jynus: checksumming dewiki database, higher write rate/dbstore lag expected temporarily
* 20:43 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78]
* 15:10 ottomata: rebooting kafka broker analytics1021 to hopefully reload /dev/sdg with new disk, also will turn on hyperthreading
* 20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
* 14:13 ottomata: rebooting analytics1056 after upgrading kernel to linux-image-3.13.0-61-generic
* 20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
* 13:58 urandom: restarting restbase1001 to apply temporary GC setting
* 19:39 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 00m 09s)
* 13:34 ottomata: stopping kafka broker on analytics1021 due to bad disk.
* 19:38 milimetric@deploy2002: Started deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942]
* 13:30 bblack: wiped ganglia apache access log on uranium, to free up half of the (full) rootfs
* 19:38 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 08m 08s)
* 10:07 godog: enable puppet on ms-fe1/ms-be1
* 19:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 09:49 godog: disable puppet on ms-fe1/ms-be1 before merging https://gerrit.wikimedia.org/r/#/c/231240/
* 19:30 milimetric@deploy2002: Started deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942]
* 07:06 _joe_: restarting gitblit, because it will be decommissioned "soon"...
* 19:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 02:34 logmsgbot: l10nupdate@tin Synchronized php-1.26wmf19/cache/l10n: l10nupdate for 1.26wmf19 (duration: 11m 19s)
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
* 19:17 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
* 19:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 19:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
* 19:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 19:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
* 19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 18:55 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944] (duration: 00m 12s)
* 18:55 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944]
* 18:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
* 18:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 18:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
* 18:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 18:31 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
* 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
* 18:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 18:12 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 18:04 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:59 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:53 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:52 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 17:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:40 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:34 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:28 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:22 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 17:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
* 16:49 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 16:42 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 16:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 16:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 16:04 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2003-dev']
* 16:04 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:59 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:59 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:57 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:57 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2003-dev']
* 15:53 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:53 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:50 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:50 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
* 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
* 15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
* 15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
* 15:08 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
* 14:52 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
* 14:50 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 14:47 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 14:38 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
* 14:36 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
* 14:22 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
* 14:20 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
* 14:09 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:08 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
* 13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
* 13:54 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
* 13:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
* 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply