You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(mutante: people1003 - rsycncing /home from peopel1002)
imported>Stashbot
(ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 00m 57s))
 
(126 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2021-04-30 ==
== 2021-09-18 ==
* 21:54 mutante: people1003 - rsycncing /home from peopel1002
* 01:47 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 00m 57s)
* 15:30 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 01:01 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 03s)
* 15:29 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 15:25 bstorm: hard rebooting cloudmetrics1002 [[phab:T275605|T275605]]
* 11:40 ladsgroup@deploy1002: Synchronized static/favicon/wikitech.ico: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 56s)
* 11:36 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 56s)
* 11:34 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 11:33 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 11:31 ladsgroup@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 09:04 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
* 09:03 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
* 08:11 moritzm: remove mc1027 from debmonitor, server is broken and won't return ([[phab:T276415|T276415]])
* 07:38 moritzm: installing iputils updates from Buster point release
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15667 and previous config saved to /var/cache/conftool/dbconfig/20210430-061549-root.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15666 and previous config saved to /var/cache/conftool/dbconfig/20210430-060046-root.json
* 05:51 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15665 and previous config saved to /var/cache/conftool/dbconfig/20210430-054542-root.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15664 and previous config saved to /var/cache/conftool/dbconfig/20210430-053038-root.json
* 05:16 marostegui: Upgrade kernel on db1114
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15663 and previous config saved to /var/cache/conftool/dbconfig/20210430-051558-marostegui.json
* 05:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1080.eqiad.wmnet
* 04:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1080.eqiad.wmnet
* 04:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo systemctl restart wdqs-blazegraph`
* 04:43 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 04:43 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 04:42 ryankemper: [[phab:T261239|T261239]] `elastic2033`, which is known to be in a state of hardware failure (we have a ticket open), is holding up the reboot of codfw. I don't think we have a good way to exclude a node currently. Going to just proceed to `eqiad` for now
* 04:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 04:39 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 04:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 04:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 04:03 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 03:50 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1010.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:47 ryankemper: [[phab:T280563|T280563]] about half of codfw nodes have been rebooted before the failure caused by write queue not emptying fast enough, kicking it off again:`sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 03:45 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 01:08 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]


== 2021-04-29 ==
== 2021-09-17 ==
* 23:36 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:683749{{!}}Revert "DEMO: Add newline to README"]] (duration: 00m 56s)
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:18 ryankemper: [[phab:T280563|T280563]] successful reboot of `relforge100[3,4]`; `relforge` cluster is back to green status.
* 21:19 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:16 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:683747{{!}}DEMO: Add newline to README]] (duration: 00m 56s)
* 19:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 23:08 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts` (amended command)
* 17:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
* 23:06 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 17:02 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 23:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
* 22:46 ryankemper: [[phab:T280563|T280563]] Current master is `relforge1003-relforge-eqiad`, will reboot `1004` first then `1003` after
* 16:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 22:44 ryankemper: [[phab:T280563|T280563]] Bleh, we never moved the new config into spicerack, so it's trying to talk to the old relforge hosts which no longer exist. Will reboot relforge manually and use the cookbook for codfw/eqiad, and circle back later for the spicerack change
* 16:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:37 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 22:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 16:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:32 ryankemper: [[phab:T280563|T280563]] Spotted the issue; forgot to set `--without-lvs` for relforge reboot
* 16:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 22:27 ryankemper: [[phab:T280563|T280563]] `urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fbe4bb8a518>: Failed to establish a new connection: [Errno -2] Name or service not known`
* 14:49 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 22:26 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 22:26 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 13:06 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
* 22:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 11:28 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 22:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 09:37 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s)
* 21:36 mutante: icinga - enabling disabled notifications for random an-worker nodes where mgmt interface had enabled alerts but the actual host didnt
* 09:37 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency
* 21:32 mutante: icinga - enabled notifications for checks on ms-backup1001 - they were all manually disabled but none of the checks had any status change since 50 days which indicates it was forgotten to turn them back on which is a common issue with disabling notifications
* 09:36 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s)
* 21:16 mutante: backup1001 - sudo check_bacula.py --icinga
* 09:19 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist
* 20:54 marostegui: Stop mysql on tendril for the UTC night, dbtree and tendrill will remain down for a few hours [[phab:T281486|T281486]]
* 08:00 jayme: restarting php-fpm on wtp1037 and wtp1030
* 20:16 marostegui: Restart tendril database - [[phab:T281486|T281486]]
* 02:28 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'`
* 20:00 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 02:22 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer
* 19:46 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]] (duration: 01m 08s)
* 01:55 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent`
* 19:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 01:48 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - [[phab:T290330|T290330]]"'`
* 19:32 dpifke@deploy1002: Finished deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484 (duration: 00m 05s)
* 00:04 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 19:32 dpifke@deploy1002: Started deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484
* 00:01 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 19:01 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki/wanobjectcache/revision_row_1/ (bad data from Sep 2019)
* 18:59 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/rl-minify-* (bad data from Aug 2018)
* 18:58 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki_ExternalGuidance_init_Google_tr_fr (bad data from Nov 2019)
* 18:38 krinkle@deploy1002: Synchronized php-1.37.0-wmf.1/includes/libs/objectcache/MemcachedBagOStuff.php: {{Gerrit|I926797a9d494a31}}, [[phab:T281480|T281480]] (duration: 01m 08s)
* 18:33 mutante: LDAP - added mmandere to wmf group ([[phab:T281344|T281344]])
* 18:10 krinkle@deploy1002: Synchronized php-1.37.0-wmf.3/includes/libs/objectcache/MemcachedBagOStuff.php: {{Gerrit|I926797a9d494a31}}, [[phab:T281480|T281480]] (duration: 01m 09s)
* 17:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:29 ryankemper: [[phab:T281498|T281498]] `sudo -E cumin 'C:role::lvs::balancer' 'sudo run-puppet-agent'`
* 16:28 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
* 16:27 liw@deploy1002: sync-wikiversions aborted: Revert "group[0{{!}}1] wikis to [VERSION]" (duration: 00m 01s)
* 16:22 ryankemper: [[phab:T281498|T281498]] `ryankemper@wdqs2004:~$ sudo depool`
* 16:20 ryankemper: [[phab:T281498|T281498]] `ryankemper@wdqs2004:~$ sudo run-puppet-agent`
* 16:18 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]] (duration: 02m 39s)
* 16:15 otto@deploy1002: Started deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]]
* 16:12 papaul: powerdown thanos-fe2001 for memory swap
* 15:44 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here)
* 15:43 ryankemper: [WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up
* 15:37 ryankemper: [WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001`
* 15:35 ryankemper: [WDQS] ^ scratch that, depooled `wdqs2001`
* 15:34 ryankemper: [WDQS] pooled `wdqs2001`
* 14:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
* 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
* 13:44 moritzm: installing Java security updates on stat* hosts
* 13:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
* 13:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
* 13:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
* 13:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
* 13:40 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]] (duration: 02m 59s)
* 13:37 otto@deploy1002: Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]]
* 13:11 moritzm: installing postgresql-11 security updates
* 13:08 jbond42: merge netbase change to manage /etc/services
* 13:07 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
* 13:06 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
* 12:36 Amir1: upgrading Quiddity to admin in mailman3
* 12:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
* 12:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
* 12:26 moritzm: installing grub2 updates from buster point release
* 12:06 jbond42: update debmonitor.discover.wmnet ssl cert
* 11:59 ladsgroup@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:683454{{!}}Undeploy JADE from production, Part III (T281418)]] (duration: 01m 07s)
* 11:54 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683453{{!}}Undeploy JADE from production, Part II (T281418)]], Part I (duration: 01m 06s)
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683452{{!}}Undeploy JADE from production, Part I (T281418)]] (duration: 01m 07s)
* 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 11:38 mbsantos@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683548{{!}}Enable suggested values in TemplateData and VisualEditor CommonSettings (T273857)]] (duration: 01m 07s)
* 11:34 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683534{{!}}Another fix for token cookie handling (T281346)]] (duration: 01m 07s)
* 11:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683533{{!}}Another fix for token cookie handling (T281346)]] (duration: 01m 08s)
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15658 and previous config saved to /var/cache/conftool/dbconfig/20210429-113211-root.json
* 11:24 mbsantos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683547{{!}}Enable suggested values in TemplateData and VisualEditor InitialiseSettings (T273857)]] (duration: 01m 07s)
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15657 and previous config saved to /var/cache/conftool/dbconfig/20210429-111708-root.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15656 and previous config saved to /var/cache/conftool/dbconfig/20210429-110204-root.json
* 10:59 moritzm: updating apt on buster (SUA 198), which eases bullseye upgrades [[phab:T275873|T275873]]
* 10:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683135{{!}}Fix CX token cookie (T281346)]] (duration: 01m 08s)
* 10:54 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683134{{!}}Fix CX token cookie (T281346)]] (duration: 01m 09s)
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15655 and previous config saved to /var/cache/conftool/dbconfig/20210429-104700-root.json
* 10:27 marostegui: Upgrade kernel on db1110
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15654 and previous config saved to /var/cache/conftool/dbconfig/20210429-102447-marostegui.json
* 09:42 volans: uploaded pynetbox 5.3.0-2 to bullseye-wikimedia on qpt.w.o
* 09:39 volans@deploy1002: Finished deploy [homer/deploy@e394769]: Release v0.2.8 (duration: 03m 30s)
* 09:35 volans@deploy1002: Started deploy [homer/deploy@e394769]: Release v0.2.8
* 09:01 jynus: stop replication and checking data of db2100:s7
* 08:57 marostegui: Upgrade kernel on db2133
* 08:51 marostegui: Upgrade kernel on db2125
* 08:50 marostegui: Upgrade kernel on db2124
* 08:46 marostegui: Upgrade kernel on db2122
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 100%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15652 and previous config saved to /var/cache/conftool/dbconfig/20210429-084011-root.json
* 08:39 marostegui: Upgrade kernel on db2121
* 08:33 marostegui: Upgrade kernel on db2120
* 08:28 volans@deploy1002: Finished deploy [homer/deploy@89cd07c]: Release v0.2.7 (duration: 03m 08s)
* 08:27 marostegui: Upgrade kernel on db2115
* 08:25 volans@deploy1002: Started deploy [homer/deploy@89cd07c]: Release v0.2.7
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 80%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15651 and previous config saved to /var/cache/conftool/dbconfig/20210429-082507-root.json
* 08:19 marostegui: Upgrade kernel on db2114
* 08:12 marostegui: Upgrade kernel on db2109
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 70%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15649 and previous config saved to /var/cache/conftool/dbconfig/20210429-081004-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 60%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15648 and previous config saved to /var/cache/conftool/dbconfig/20210429-075500-root.json
* 07:54 marostegui: Upgrade kernel on db2089
* 07:48 jynus: rolling restart of bacula hosts [[phab:T273182|T273182]]
* 07:48 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 01m 07s)
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15647 and previous config saved to /var/cache/conftool/dbconfig/20210429-074625-root.json
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 50%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15646 and previous config saved to /var/cache/conftool/dbconfig/20210429-073956-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 90%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15645 and previous config saved to /var/cache/conftool/dbconfig/20210429-073122-root.json
* 07:28 marostegui: Stop mysql and upgrade kernel on pc1007
* 07:28 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 01m 08s)
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 40%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15644 and previous config saved to /var/cache/conftool/dbconfig/20210429-072453-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 80%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15643 and previous config saved to /var/cache/conftool/dbconfig/20210429-071618-root.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 25%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15642 and previous config saved to /var/cache/conftool/dbconfig/20210429-070949-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15641 and previous config saved to /var/cache/conftool/dbconfig/20210429-070114-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 10%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15640 and previous config saved to /var/cache/conftool/dbconfig/20210429-065445-root.json
* 06:53 godog: add 100G to prometheus/ops in eqiad
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 60%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15639 and previous config saved to /var/cache/conftool/dbconfig/20210429-064611-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15637 and previous config saved to /var/cache/conftool/dbconfig/20210429-063107-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 40%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15636 and previous config saved to /var/cache/conftool/dbconfig/20210429-061603-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 30%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15635 and previous config saved to /var/cache/conftool/dbconfig/20210429-060100-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15634 and previous config saved to /var/cache/conftool/dbconfig/20210429-054556-root.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 20%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15633 and previous config saved to /var/cache/conftool/dbconfig/20210429-053052-root.json
* 05:22 marostegui: Check tables on db1121 (this will cause lag on s4 commonswiki, on wikireplicas)
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for tables checking', diff saved to https://phabricator.wikimedia.org/P15632 and previous config saved to /var/cache/conftool/dbconfig/20210429-052146-marostegui.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 15%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15631 and previous config saved to /var/cache/conftool/dbconfig/20210429-051549-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15630 and previous config saved to /var/cache/conftool/dbconfig/20210429-050045-root.json
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15629 and previous config saved to /var/cache/conftool/dbconfig/20210429-045557-marostegui.json
* 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15627 and previous config saved to /var/cache/conftool/dbconfig/20210429-045015-marostegui.json
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15626 and previous config saved to /var/cache/conftool/dbconfig/20210429-044458-marostegui.json
* 04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
* 04:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15625 and previous config saved to /var/cache/conftool/dbconfig/20210429-043857-marostegui.json
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1156 to dbctl [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15624 and previous config saved to /var/cache/conftool/dbconfig/20210429-043812-marostegui.json
* 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for reimage', diff saved to https://phabricator.wikimedia.org/P15623 and previous config saved to /var/cache/conftool/dbconfig/20210429-042757-marostegui.json
* 02:59 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job (duration: 00m 06s)
* 02:59 milimetric@deploy1002: Started deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job
* 02:58 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b]: Hotfix for referrer job (duration: 14m 40s)
* 02:44 milimetric@deploy1002: Started deploy [analytics/refinery@740226b]: Hotfix for referrer job
* 01:44 krinkle@deploy1002: Synchronized wmf-config/mc.php: {{Gerrit|I5869b3c3ba4a}} (duration: 01m 08s)
* 01:23 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 01:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:19 ryankemper: [[phab:T280382|T280382]] Aborted data transfer; `wdqs2007` is hosed (see https://phabricator.wikimedia.org/T281437)
* 01:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 00:40 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/specials/pagers/ImageListPager.php: [[phab:T281405|T281405]] (duration: 01m 08s)
* 00:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 00:06 ryankemper: [[phab:T280382|T280382]] `wdqs1013.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`


== 2021-04-28 ==
== 2021-09-16 ==
* 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:58 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 23:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 23:51 ryankemper: [[phab:T273673|T273673]] All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'`
* 23:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 23:44 ryankemper: [[phab:T273673|T273673]] The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger
* 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 23:39 ryankemper: [[phab:T273673|T273673]] Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force`
* 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 23:37 ryankemper: [[phab:T273673|T273673]] Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - [[phab:T273673|T273673]]"'`
* 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 23:21 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 23:21 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 23:06 dpifke@deploy1002: Finished deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 (duration: 00m 05s)
* 23:19 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 23:06 dpifke@deploy1002: Started deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886
* 23:18 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 22:44 dwisehaupt: civiproxy revision changed to {{Gerrit|99cecb924a}} - initial rollout of code for testing
* 23:18 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 22:26 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 23:17 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 22:26 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:17 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 22:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:16 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 22:18 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 22:38 legoktm@deploy1002: Finished scap: i18n for restoring deprecated token APIs (duration: 15m 30s)
* 22:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 22:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 22:23 legoktm@deploy1002: Started scap: i18n for restoring deprecated token APIs
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 22:21 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/: Restore deprecated token APIs (3/3) (duration: 00m 56s)
* 21:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 22:19 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/autoload.php: Restore deprecated token APIs (2/3) (duration: 00m 56s)
* 21:44 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 22:16 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/ApiTokens.php: Restore deprecated token APIs (1/3) (duration: 00m 56s)
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:22 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
* 21:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 21:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
* 21:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:39 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 20:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:721610{{!}}Set jQuery migrate to false for wikibooks and Commons (T280944)]] (duration: 00m 56s)
* 21:38 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 ryankemper: [[phab:T280382|T280382]] `wdqs2007` is reachable again; glancing at `/srv/wdqs` its `wikidata.jnl` is `839G` when it should be `975G` so I'll re-do the wikidata journal transfer
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:32 ryankemper: [[phab:T280382|T280382]] [WDQS] `wdqs2007` ssh is unreachable; power cycling via `racadm>>racadm serveraction powercycle`
* 19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.23
* 21:24 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (previous reimage timed out, instance appears to have rebooted)
* 18:55 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:07 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 18:50 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 18:49 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 18:46 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addlink/AddLinkArticleTarget.js: {{Gerrit|bb8cba102fe417e8e41b7c4e9179d119c7d25a43}}: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (2/2) (duration: 01m 06s)
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 18:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/extension.json: {{Gerrit|bb8cba102fe417e8e41b7c4e9179d119c7d25a43}}: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (1/2) (duration: 01m 07s)
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 17:54 volans: turn of lldp agent on NIC (both ports) on ms-be105[1-9],ms-be205[2-6] - [[phab:T290984|T290984]]
* 20:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:31 volans: turn of lldp agent on NIC (both ports) on ms-be2051 - [[phab:T290984|T290984]]
* 19:57 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
* 17:09 jynus: deployed extra grants for admin user on s6 primary
* 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
* 16:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-coord1002.eqiad.wmnet
* 19:13 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]] (duration: 01m 07s)
* 16:17 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-coord1002.eqiad.wmnet
* 19:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position) [[phab:T167973|T167973]]
* 18:21 legoktm: added mvolz as listadmin for services@ and reset admin pw ([[phab:T278516|T278516]])
* 16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position)
* 17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Wikibase/client/includes/DataAccess/Scribunto/WikibaseLanguageIndependentLuaBindings.php: {{Gerrit|b392dba0d77904d7de819043e51d8c3fbf003873}}: Fix incorrect ItemId typehint in Lua bindings ([[phab:T281361|T281361]]) (duration: 01m 09s)
* 15:52 bd808: marostegui is awesome and made wikitech better today. :)
* 16:52 papaul: powerdown logstash2034 for relocation
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set wikitech on read-only for maintenance [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P17283 and previous config saved to /var/cache/conftool/dbconfig/20210916-150444-marostegui.json
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 15:03 marostegui: Set wikitech on read-only (from now on all SAL changes will fail) [[phab:T167973|T167973]]
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 14:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 16:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 14:53 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
* 16:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
* 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 14:35 mutante: reimaging mwmaint2002 to buster ([[phab:T267607|T267607]], [[phab:T245757|T245757]])
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 16:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 14:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 14:12 mutante: switching https://noc.wikimedia.org from codfw to eqiad ([[phab:T287539|T287539]], [[phab:T267607|T267607]])
* 16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 13:44 sukhe: homer: running for Gerrit: 721018: set up BGP peering to durum hosts in <nowiki>{</nowiki>eqiad,codfw,esams,ulsfo,eqsin<nowiki>}</nowiki>
* 16:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 13:25 effie: pool mw1422 mw1455
* 16:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 13:24 effie: poiol mw1422 mw1455
* 16:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 13:12 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 01m 04s)
* 15:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 13:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:08 marostegui: Deploy schema change on s2 codfw (lag will show up) [[phab:T290057|T290057]]
* 15:20 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 12:00 mbsantos: start OSM re-import script in maps2009 (depooled)
* 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts conf[2001-2003].codfw.wmnet
* 11:51 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: {{Gerrit|529f86c5a998820c32e7d7f2d952317080383e05}}: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees ([[phab:T291088|T291088]]) (duration: 01m 04s)
* 15:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: {{Gerrit|9e0f6f84240bf621e97806a94a0e786817001668}}: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees ([[phab:T291088|T291088]]) (duration: 01m 04s)
* 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: Fixing incorrect deployment of {{Gerrit|01e4450}} for [[phab:T291123|T291123]]. This is supposed to be a no-op. (duration: 01m 05s)
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 11:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:03 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23 (wmf/1.37.0-wmf.23 * u+2-2)]$ git rebase &&  git submodule update extensions/AbuseFilter/ # fixing an incorrect deployment that happened in [[phab:T291123|T291123]]
* 15:00 moritzm: imported python-poolcounter 0.0.2-1+deb11u1 to apt.wikimedia.org [[phab:T275873|T275873]]
* 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:53 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts conf[2001-2003].codfw.wmnet
* 11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23/extensions/AbuseFilter (wmf/1.37.0-wmf.23 u=)]$ git co {{Gerrit|0d2bc7ca17b9f767ae5753db7e4e41fd9e7d3531}} # reset repo to expected state, fixing incorrect deploy of a backport in [[phab:T291123|T291123]]
* 14:44 moritzm: imported gitlab-ce 13.9.7-ce.0 to apt.wikimedia.org
* 11:34 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
* 14:40 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d] (duration: 04m 59s)
* 11:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:35 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d]
* 11:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d] (duration: 00m 06s)
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 14:34 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d]
* 11:21 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 03m 07s)
* 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:32 moritzm: installing iproute2 updates from buster point release
* 11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:31 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:721305{{!}}Add new WikimediaBadges config (T232927)]] (2/2) (duration: 01m 05s)
* 14:30 milimetric@deploy1002: deploy aborted: - (duration: 00m 00s)
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:721305{{!}}Add new WikimediaBadges config (T232927)]] (1/2) (duration: 01m 05s)
* 14:30 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: -
* 11:03 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 14:30 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 12m 31s)
* 11:03 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 14:26 moritzm: installing net-snmp updates from buster point release
* 10:59 hashar@deploy1002: Synchronized php-1.37.0-wmf.21/includes/language/Message.php: Message: Remove deprecated format property - [[phab:T146416|T146416]] [[phab:T291124|T291124]] (duration: 01m 06s)
* 14:17 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 10:21 topranks: Changing default gateway on mw1422 to use VRRP backup (cr2), to determine if tail drops from switches to cr1 is cause of TCP retransmissions.
* 13:15 jayme: restarting pybal on lvs5001,lvs4005,lvs2007 - [[phab:T271573|T271573]]
* 10:14 effie: depool mw1455 for network testing
* 13:14 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 3.17.0-wmf.1"
* 10:11 effie: depool mw1422 for network testing
* 13:10 jayme: restarting pybal on lvs5002,lvs4006,lvs2008 - [[phab:T271573|T271573]]
* 10:01 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 13:04 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
* 10:01 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 10:00 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 13:03 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
* 10:00 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 13:02 moritzm: upgrading deployment servers to PHP 7.4.32
* 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2002.wikimedia.org with reason: reimage
* 12:55 moritzm: upgrading snapshot hosts to PHP 7.4.32
* 09:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2002.wikimedia.org with reason: reimage
* 12:48 jayme: restarting pybal on lvs2009 - [[phab:T271573|T271573]]
* 09:10 moritzm: in-place re-installation of mx2002.wikimedia.org (test VM) to test the new installer key support in the sre.puppet.renew-cert cookbook
* 12:45 moritzm: upgrading labweb to PHP 7.4.32
* 08:04 moritzm: upgrading scandium to PHP 7.2 backport of patch for enhanced DOM replaceChild/removeChild performance  [[phab:T291052|T291052]]
* 12:43 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 07:48 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
* 12:42 jayme: restarting pybal on lvs5003,lvs4007 - [[phab:T271573|T271573]]
* 05:35 marostegui: Optimize dewiki.logging in codfw [[phab:T287344|T287344]]
* 12:39 jayme: restarting pybal on lvs2010 - [[phab:T271573|T271573]]
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 12:28 apergos: manually edited /srv/deployment/dumps/dumps-cache/config on snapshots1011,12,13 to change deploy1001 to deploy1002 (where did it get the old value from? these are new installs!)
* 12:16 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 12:15 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 11:53 jayme: switching SRV record _etcd._tcp to new etcd cluster (for codfw, eqsin, ulsfo)
* 11:22 Urbanecm: EU B&C window done
* 11:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: {{Gerrit|8d0ae5e8fedefa911fc216bfc810d7a6169ea7e5}}: Separate reference preview settings in beta & non-beta ([[phab:T281235|T281235]]) (duration: 01m 08s)
* 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddbc378e41783356e28cd90bbefa08624ea2844c}}: Enable partial action blocks on testwiki ([[phab:T280528|T280528]]) (duration: 01m 07s)
* 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 10:44 jbond42: updated the check-raid nrpe script to python3
* 09:40 moritzm: restarting Tomcat on idp-test1001 to pick up Java security updates
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15618 and previous config saved to /var/cache/conftool/dbconfig/20210428-092103-root.json
* 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1001.wikimedia.org
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint1001.wikimedia.org
* 09:09 moritzm: restarting jenkins* on releases to pick up Java security updates
* 09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15617 and previous config saved to /var/cache/conftool/dbconfig/20210428-090559-root.json
* 08:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15616 and previous config saved to /var/cache/conftool/dbconfig/20210428-085056-root.json
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 01m 08s)
* 08:41 urbanecm@deploy1002: sync-file aborted: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 00m 02s)
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15615 and previous config saved to /var/cache/conftool/dbconfig/20210428-083625-marostegui.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15614 and previous config saved to /var/cache/conftool/dbconfig/20210428-083552-root.json
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15613 and previous config saved to /var/cache/conftool/dbconfig/20210428-083458-root.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15612 and previous config saved to /var/cache/conftool/dbconfig/20210428-082625-root.json
* 08:25 effie: update php7.2 on jobrunners and parsoid servers && rolling  php7.2-fpm restarts
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15611 and previous config saved to /var/cache/conftool/dbconfig/20210428-081121-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15610 and previous config saved to /var/cache/conftool/dbconfig/20210428-075618-root.json
* 07:52 effie: update php7.2 on api servers && rolling  php7.2-fpm restarts
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15609 and previous config saved to /var/cache/conftool/dbconfig/20210428-074114-root.json
* 07:40 marostegui: Deploy schema change on db1098:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 07:27 effie: update php7.2 on appservers && rolling  php7.2-fpm restarts
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for schema change and kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15608 and previous config saved to /var/cache/conftool/dbconfig/20210428-072609-marostegui.json
* 07:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:14 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:12 elukey: add AAAA record for kafka-main200[3,4,5].codfw.wmnet
* 07:10 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:05 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:04 elukey: add AAAA record for kafka-main2002.codfw.wmnet
* 07:03 marostegui: Deploy schema change on db2089:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:26 legoktm: created mailman3 superusers for Administrator (noc@), Ladsgroup and Legoktm
* 06:23 legoktm: legoktm@lists1001:~$ sudo mailman-web set_default_site --name lists.wikimedia.org --domain lists.wikimedia.org
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15607 and previous config saved to /var/cache/conftool/dbconfig/20210428-061426-root.json
* 06:00 marostegui: Stop MySQL on db2096 (x1 codfw) [[phab:T281135|T281135]]
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15606 and previous config saved to /var/cache/conftool/dbconfig/20210428-055922-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1167 in s8 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15605 and previous config saved to /var/cache/conftool/dbconfig/20210428-055144-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15604 and previous config saved to /var/cache/conftool/dbconfig/20210428-054419-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15603 and previous config saved to /var/cache/conftool/dbconfig/20210428-052915-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15602 and previous config saved to /var/cache/conftool/dbconfig/20210428-051526-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 (old s1 master) for schema change', diff saved to https://phabricator.wikimedia.org/P15601 and previous config saved to /var/cache/conftool/dbconfig/20210428-050754-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 master and remove read-only from s1 [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15600 and previous config saved to /var/cache/conftool/dbconfig/20210428-050138-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15599 and previous config saved to /var/cache/conftool/dbconfig/20210428-050041-marostegui.json
* 05:00 marostegui: Starting s1 eqiad failover from db1083 to db1163 - [[phab:T278214|T278214]]
* 04:14 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 04:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:08 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 04:08 marostegui: Start replication changes, connect everything to db1163 [[phab:T278214|T278214]]
* 04:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 before the switchover [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15598 and previous config saved to /var/cache/conftool/dbconfig/20210428-040718-marostegui.json
* 03:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:49 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet
* 03:48 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1013.eqiad.wmnet
* 03:33 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1012` to clear the `WDQS SPARQL` warning
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2007.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 02:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:28 robh@cumin1001: START - Cookbook sre.dns.netbox
* 01:06 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 00:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE


== 2021-04-27 ==
== 2021-09-15 ==
* 23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 23:02 legoktm: upgrading lists1001 to use postorius 1.3.5
* 23:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 22:51 legoktm: uploaded new mailmanclient/postorius packages to apt1001
* 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE
* 22:38 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 23:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 22:03 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 23:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 23:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 22:02 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@902529b]: 0.3.85 (duration: 06m 59s)
* 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 21:56 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.85` on canary `wdqs1003`; proceeding to rest of fleet
* 21:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2005-2006].codfw.wmnet
* 21:55 ryankemper@deploy1002: Started deploy [wdqs/wdqs@902529b]: 0.3.85
* 20:55 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2005-2006].codfw.wmnet
* 21:55 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.85`. Pre-deploy tests passing on canary `wdqs1003`
* 20:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2003-2004].codfw.wmnet
* 21:42 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs (duration: 02m 07s)
* 20:42 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2003-2004].codfw.wmnet
* 21:40 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs
* 20:32 bblack: re-pooling codfw public traffic - [[phab:T279457|T279457]]
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:11 jhuneidi@deploy1002: Synchronized php-1.37.0-wmf.3/includes/rcfeed/IRCColourfulRCFeedFormatter.php: Backport rcfeed: Remove reference assignment ([[phab:T281226|T281226]]) to 1.37.0-wmf.3 (duration: 01m 12s)
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:06 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:44 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
* 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|60e7e515d7034a9f839d78851f1dcc2be3df7f3b}}: Set wmgEchoEnablePush to false explicitly on arbcom_* wikis ([[phab:T291128|T291128]]) (duration: 01m 06s)
* 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 19:50 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: sync backport for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/721312 (duration: 01m 06s)
* 19:35 papaul: powerdown ms-backup2001  for maintenance
* 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:07 papaul: powerdown logstash2035  for maintenance
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
* 19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet
* 19:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback all wikis to 1.37.0-wmf.23
* 18:50 mutante: people1003 - destroying VM and recreating again from scratch to test if issue of no console and no access is repeatable
* 19:07 urbanecm: Re-start server-side upload for 1 video file, likely temporary swift failure ([[phab:T289781|T289781]])
* 18:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet
* 19:06 urbanecm: Start server-side upload for 1 video file ([[phab:T287686|T287686]])
* 18:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 19:04 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 00m 55s)
* 18:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
* 18:33 mutante: people1003 - rebooting, trying to get new VM to work
* 18:52 urbanecm: Start server-side upload for 1 video file ([[phab:T289949|T289949]])
* 18:33 Urbanecm: Morning B&C window done
* 18:50 urbanecm: Start server-side upload for 1 video file ([[phab:T289781|T289781]])
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|91a85f2}}: {{Gerrit|ac770bf}}: Enable language in header for office and testwiki users ([[phab:T280526|T280526]]) (duration: 01m 19s)
* 18:44 urbanecm: Start server-side upload for 3 large PDF files ([[phab:T290722|T290722]])
* 18:32 bblack: lvs2009 - restart pybal + re-run puppet agent - [[phab:T279457|T279457]]
* 18:43 legoktm: migrated sitereq-l@ from Google Groups to Mailman ([[phab:T290908|T290908]])
* 18:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:27 urbanecm: Start server-side upload for 1 video file ([[phab:T290290|T290290]])
* 18:20 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[56].codfw.wmnet
* 18:23 urbanecm: Start server-side upload for 1 video file ([[phab:T290685|T290685]])
* 18:20 bblack: cp203[56] - repooling in etcd - [[phab:T279457|T279457]]
* 18:21 urbanecm: Start server-side upload for 1 video file ([[phab:T290707|T290707]])
* 18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:17 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:17 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:16 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7620084a1ed92066aa8b29fa609cf6cbb4f799ab}}: Add portrattarkiv.se to wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T290581|T290581]]) (duration: 01m 05s)
* 18:11 bblack: dns2001 - restarting bird to repool, then re-enabling puppet - [[phab:T279457|T279457]]
* 17:39 mutante: thumbor - running puppet on all thumbor hosts, removed cron job systemd-thumbor-tmpfiles-clean, added thumbor_systemd_tmpfiles_clean timer job
* 18:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:56 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3] (duration: 06m 15s)
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:50 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3]
* 18:02 ejegg: update payments-wiki from {{Gerrit|9a4eef1375}} to {{Gerrit|44570561f2}}
* 16:47 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3] (duration: 00m 07s)
* 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 16:47 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3]
* 17:58 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 16:45 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3] (duration: 19m 43s)
* 17:34 papaul: powerdown moss-fe2001  for maintenance
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5002.eqsin.wmnet
* 17:32 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:26 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3]
* 17:29 robh@cumin1001: START - Cookbook sre.dns.netbox
* 16:19 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5002.eqsin.wmnet
* 17:25 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5001.eqsin.wmnet
* 17:23 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:02 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5001.eqsin.wmnet
* 17:21 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 15:56 urbanecm: Remove 2FA for User:Rho at wikitech, identity verified via a videocall
* 17:19 ryankemper: [[phab:T281215|T281215]] Banned `elastic2043` from codfw cirrussearch cluster
* 14:50 moritzm: installing lz4 security updates on stretch
* 17:16 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:14 papaul: powerdown kafka-logging2003  for maintenance
* 13:33 ottomata: pointing <nowiki>{</nowiki>stats,analytics<nowiki>}</nowiki>.wikimedia.org at analytics-web.discovery.wmnet cname - [[phab:T285355|T285355]]
* 17:14 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4002.ulsfo.wmnet
* 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:18 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4002.ulsfo.wmnet
* 17:09 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:15 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4001.ulsfo.wmnet
* 17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4001.ulsfo.wmnet
* 17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:52 papaul: powerdown elastic2045  for maintenance
* 11:41 marostegui: Install 10.4.21-2 on db1125
* 16:49 papaul: powerdown ms-be2042 for maintenance
* 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:39 dcaro: reprepro updating packages on thirdparty/ceph-nautilus-buster
* 11:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:21 Lucas_WMDE: EU backport+config window done
* 16:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720983{{!}}Enable change-tags for new edits' proofread status at mulWS (T289140)]] (duration: 01m 06s)
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 39 hosts with reason: upgrading openstack
* 11:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 39 hosts with reason: upgrading openstack
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:583407{{!}}Don’t check constraints on two property qualifiers (T235292)]] (duration: 01m 11s)
* 16:22 effie: upgrading scap 3.17.1-1 on mediawiki canaries - [[phab:T279695|T279695]]
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:18 effie: uploading scap_3.17.1-1
* 10:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
* 16:18 effie: uploading cap_3.17.1-1
* 09:55 effie: depool wtp1026
* 15:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1026.eqiad.wmnet
* 09:54 effie: depooling mw1312 and mw1319
* 14:48 moritzm: installing file/libmagic updates from buster point release
* 09:46 topranks: Disabling Intel X710 NIC on-board LLDP processing on relforge1003 ([[phab:T290984|T290984]])
* 14:47 bblack: lvs2009 - disable puppet + stop pybal (internal services will move to lvs2010, please avoid LVS service definition changes for now!) - [[phab:T279457|T279457]]
* 07:04 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
* 06:57 elukey: shutdown ms-be2045 (again) after seeing [[phab:T290881|T290881]]
* 14:36 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[56].codfw.wmnet
* 06:02 elukey: powercycle ms-be2045 - no ssh, no remote tty available
* 14:36 bblack: cp203[56] - depool all etcd services via confctl - [[phab:T279457|T279457]]
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Restore db1109 original load', diff saved to https://phabricator.wikimedia.org/P17274 and previous config saved to /var/cache/conftool/dbconfig/20210915-052802-marostegui.json
* 14:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
* 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17273 and previous config saved to /var/cache/conftool/dbconfig/20210915-043053-marostegui.json
* 14:33 bblack: dns2001 - depooling for [[phab:T279457|T279457]] (disable puppet + stop bird)
* 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
* 14:31 moritzm: installing imagemagick security updates
* 14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
* 14:20 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 14:19 moritzm: installing xen security updates
* 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 14:09 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 14:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 105 hosts with reason: upgrading openstack
* 14:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 105 hosts with reason: upgrading openstack
* 14:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 9 hosts with reason: upgrading                  openstack
* 14:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 9 hosts with reason: upgrading                  openstack
* 13:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 13:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 13:55 moritzm: imported jenkins 2.277.3 to thirdparty/ci
* 13:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 13:48 moritzm: uploaded openjdk-8 8u292-b10-0~deb10u1 (buster forward port of latest Java 8 security release)
* 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:45 akosiaris: switchover api-gateway, changeprop, cpjobqueue to use the new redis cluster servers (rdb2007-rdb2010)
* 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:30 hashar: Upgrading CI Jenkins from 2.263.3 to 2.277.2
* 13:23 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 13:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1020-1026].eqiad.wmnet
* 13:19 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 13:13 liw@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.3
* 13:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/includes/Config/WikiPageConfigValidation.php: {{Gerrit|fe2a0420fd884df7046c0c283bcb2e961e74e8e9}}: WikiPageConfigValidation: Mentor lists and help desk can be null ([[phab:T281229|T281229]]) (duration: 01m 06s)
* 13:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 13:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 13:06 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1020-1026].eqiad.wmnet
* 13:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be1019.eqiad.wmnet
* 12:55 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be1019.eqiad.wmnet
* 12:46 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:682815{{!}}Revert "URGENT: Disable GlobalUsage" (T281242)]] (duration: 01m 08s)
* 12:44 hashar: Restarted CI Jenkins for plugins upgrade
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15592 and previous config saved to /var/cache/conftool/dbconfig/20210427-122619-root.json
* 12:20 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GlobalUsage: Backport: [[gerrit:682814{{!}}Avoid reading primary unless absolutely necessary (T281238)]] (duration: 01m 09s)
* 12:12 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GlobalUsage: Backport: [[gerrit:682813{{!}}Avoid reading primary unless absolutely necessary (T281238)]] (duration: 01m 09s)
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15591 and previous config saved to /var/cache/conftool/dbconfig/20210427-121115-root.json
* 12:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 12:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15590 and previous config saved to /var/cache/conftool/dbconfig/20210427-115612-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15589 and previous config saved to /var/cache/conftool/dbconfig/20210427-114108-root.json
* 11:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 11:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove RW from commonswiki', diff saved to https://phabricator.wikimedia.org/P15588 and previous config saved to /var/cache/conftool/dbconfig/20210427-111016-marostegui.json
* 11:09 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Disable GlobalUsage (duration: 01m 08s)
* 10:40 volans@cumin1001: dbctl commit (dc=all): 'S4 RO, outage', diff saved to https://phabricator.wikimedia.org/P15585 and previous config saved to /var/cache/conftool/dbconfig/20210427-104057-volans.json
* 10:18 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 10:06 XioNoX: standardize management routers ACLs with Capirca - mr1-eqiad (last one)
* 10:01 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 02m 16s)
* 09:59 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 09:56 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 22s)
* 09:56 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P15584 and previous config saved to /var/cache/conftool/dbconfig/20210427-093536-marostegui.json
* 09:35 XioNoX: standardize management routers ACLs with Capirca - mr1-eqsin
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15583 and previous config saved to /var/cache/conftool/dbconfig/20210427-093501-root.json
* 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
* 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 09:33 moritzm: rolling restart of elastic in relforge* to pick up Java updates
* 09:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
* 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15582 and previous config saved to /var/cache/conftool/dbconfig/20210427-091957-root.json
* 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
* 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 09:17 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
* 09:16 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
* 09:11 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 09:11 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
* 09:09 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
* 09:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
* 09:06 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
* 09:05 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
* 09:05 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15581 and previous config saved to /var/cache/conftool/dbconfig/20210427-090454-root.json
* 09:04 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
* 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 09:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
* 09:01 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15580 and previous config saved to /var/cache/conftool/dbconfig/20210427-084950-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P15579 and previous config saved to /var/cache/conftool/dbconfig/20210427-084651-marostegui.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15578 and previous config saved to /var/cache/conftool/dbconfig/20210427-084630-root.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114 into main and api', diff saved to https://phabricator.wikimedia.org/P15577 and previous config saved to /var/cache/conftool/dbconfig/20210427-083910-marostegui.json
* 08:36 XioNoX: standardize management routers ACLs with Capirca
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15576 and previous config saved to /var/cache/conftool/dbconfig/20210427-083145-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15575 and previous config saved to /var/cache/conftool/dbconfig/20210427-083126-root.json
* 08:24 hashar: Restarting CI Jenkins for plugins upgrade
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15574 and previous config saved to /var/cache/conftool/dbconfig/20210427-081911-marostegui.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15573 and previous config saved to /var/cache/conftool/dbconfig/20210427-081846-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15572 and previous config saved to /var/cache/conftool/dbconfig/20210427-081623-root.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15571 and previous config saved to /var/cache/conftool/dbconfig/20210427-081325-root.json
* 08:12 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
* 08:11 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
* 08:10 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
* 08:08 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
* 08:03 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 90%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15570 and previous config saved to /var/cache/conftool/dbconfig/20210427-080342-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15569 and previous config saved to /var/cache/conftool/dbconfig/20210427-080119-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15568 and previous config saved to /var/cache/conftool/dbconfig/20210427-075822-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P15567 and previous config saved to /var/cache/conftool/dbconfig/20210427-075759-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15566 and previous config saved to /var/cache/conftool/dbconfig/20210427-075738-root.json
* 07:52 liw@deploy1002: Pruned MediaWiki: 1.36.0-wmf.38 (duration: 03m 17s)
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 80%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15565 and previous config saved to /var/cache/conftool/dbconfig/20210427-074839-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15564 and previous config saved to /var/cache/conftool/dbconfig/20210427-074318-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15563 and previous config saved to /var/cache/conftool/dbconfig/20210427-074234-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15562 and previous config saved to /var/cache/conftool/dbconfig/20210427-073335-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15561 and previous config saved to /var/cache/conftool/dbconfig/20210427-072814-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15560 and previous config saved to /var/cache/conftool/dbconfig/20210427-072731-root.json
* 07:26 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 07:24 liw@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.3 (duration: 30m 54s)
* 07:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 07:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 07:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
* 07:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 60%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15559 and previous config saved to /var/cache/conftool/dbconfig/20210427-071831-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15558 and previous config saved to /var/cache/conftool/dbconfig/20210427-071227-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15557 and previous config saved to /var/cache/conftool/dbconfig/20210427-070328-root.json
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 for schema change', diff saved to https://phabricator.wikimedia.org/P15556 and previous config saved to /var/cache/conftool/dbconfig/20210427-065628-marostegui.json
* 06:55 elukey: upgrade mariadb to 10.4.18-1 + reboot on db1108 - [[phab:T279281|T279281]]
* 06:54 liw@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.3
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 40%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15555 and previous config saved to /var/cache/conftool/dbconfig/20210427-064824-root.json
* 06:37 liw: version 1.37.0-wmf.3 was branched at {{Gerrit|20ab303fd1d883592b4d2ec2468dfaccad7a9e10}} for [[phab:T278347|T278347]]
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 30%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15554 and previous config saved to /var/cache/conftool/dbconfig/20210427-063320-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15553 and previous config saved to /var/cache/conftool/dbconfig/20210427-061817-root.json
* 06:11 elukey: powercycle elastic2043 - no ssh, no tty remote console available
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 20%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15552 and previous config saved to /var/cache/conftool/dbconfig/20210427-060313-root.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 15%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15551 and previous config saved to /var/cache/conftool/dbconfig/20210427-054809-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 10%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15550 and previous config saved to /var/cache/conftool/dbconfig/20210427-053306-root.json
* 05:30 XioNoX: push pfw fw policies - [[phab:T281137|T281137]]
* 05:27 legoktm: imported hyperkitty_1.3.4-2~bpo10+2 to apt.wm.o ([[phab:T281213|T281213]])
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15549 and previous config saved to /var/cache/conftool/dbconfig/20210427-052236-root.json
* 05:21 marostegui: Stop mysql on db1087 to clone db1167 (lag will appear on wikidata on wikireplicas) [[phab:T258361|T258361]]
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1114 temporarily as db1087 will be depooled', diff saved to https://phabricator.wikimedia.org/P15547 and previous config saved to /var/cache/conftool/dbconfig/20210427-052026-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 5%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15546 and previous config saved to /var/cache/conftool/dbconfig/20210427-051802-root.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15545 and previous config saved to /var/cache/conftool/dbconfig/20210427-050826-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15544 and previous config saved to /var/cache/conftool/dbconfig/20210427-050732-root.json
* 05:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1077.eqiad.wmnet
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1077.eqiad.wmnet
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15543 and previous config saved to /var/cache/conftool/dbconfig/20210427-045229-root.json
* 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15541 and previous config saved to /var/cache/conftool/dbconfig/20210427-044609-marostegui.json
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 to dbctl, depooled, [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15540 and previous config saved to /var/cache/conftool/dbconfig/20210427-044520-marostegui.json
* 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15539 and previous config saved to /var/cache/conftool/dbconfig/20210427-043725-root.json
* 04:25 legoktm: upgrading lists-next.wikimedia.org to mailman3-from-bullseye ([[phab:T280887|T280887]])
* 04:19 marostegui: Set phabricator on read only [[phab:T279625|T279625]]
* 03:37 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 03:36 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@08ad17a]: 0.3.70 (duration: 08m 18s)
* 03:28 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.70` on canary `wdqs1003`; proceeding to rest of fleet
* 03:28 ryankemper@deploy1002: Started deploy [wdqs/wdqs@08ad17a]: 0.3.70
* 03:27 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.70`. Pre-deploy tests passing on canary `wdqs1003`
* 03:17 ryankemper: [[phab:T280382|T280382]] `wdqs1006` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to raid0: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 02:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:29 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph --task-id [[phab:T280382|T280382]]` on `ryankemper@cumin1001` tmux session `reimage`
* 01:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:21 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer


== 2021-04-26 ==
== 2021-09-14 ==
* 23:28 mutante: renewing TLS cert for peopleweb.discovery.wmnet, adding *3 hosts
* 23:01 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Re-enable VipsScaler (2 of 2) (duration: 01m 04s)
* 23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
* 22:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Re-enable VipsScaler (1 of 2) (duration: 01m 05s)
* 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
* 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
* 22:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
* 22:43 legoktm: legoktm@cumin2001:~$ sudo systemctl reset-failed # clear httpbb_hourly_tests failure, moved to cumin1001
* 22:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1006.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 22:34 legoktm@deploy1002: Finished scap: Rebuild i18n for redeployment of VipsScaler ([[phab:T290759|T290759]]) (duration: 23m 49s)
* 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
* 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:48 twentyafterfour: restarting php-fpm on phab1001 to deploy phabricator hotfix {{Gerrit|d238db85b8d8072d99f31805aa4a8a7cf0c09941}}
* 22:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
* 22:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1003.eqiad.wmnet
* 22:11 legoktm@deploy1002: Started scap: Rebuild i18n for redeployment of VipsScaler ([[phab:T290759|T290759]])
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1003.eqiad.wmnet
* 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 legoktm: uploaded python3-falcon, python3-mimeparse, python3-mujson, openstack-pkg-tools to mailman3 component on apt.wm.o
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
* 20:20 dancy: testing upcoming Scap release on beta
* 18:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
* 20:20 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720387{{!}}Early adopt wgIncludejQueryMigrate=false on nlwiki (T280944)]] (duration: 01m 48s)
* 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
* 20:06 cdanis: [[phab:T290425|T290425]] ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data lyfcttm2lhw4
* 18:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
* 20:06 cdanis: [[phab:T290425|T290425]] ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data h5mvbny28713
* 18:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
* 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2d16f6251a67cf13cef02bbdcb3c9f5c1c505d16}}: elwiki: Update Growth experiments configuration ([[phab:T280172|T280172]]) (duration: 00m 58s)
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 urbanecm@deploy1002: Synchronized multiversion/MWScript.php: {{Gerrit|5ace4e1b806bcfc4ea059f9e9cae9aa94c0bdbd1}}: Fix error message if MWScript.php is run without arguments (duration: 00m 58s)
* 19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.23
* 17:28 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:48 moritzm: removed filter for tcp/25 on mx2001, reimage is complete [[phab:T286911|T286911]]
* 17:26 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:18 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:06 legoktm: imported postorius_1.3.4-2~bpo10+2 to apt.wm.o
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:49 mutante: gerrit - restarted apache (hard) to remove time out from gerrit:682502
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:40 mutante: gerrit1001 - reload apache2
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2982638039720107d0b6e3227f5dce5b34ce7533}}: Offer the DiscussionTools reply tool as opt-out setting at ptwikinews ([[phab:T285162|T285162]]) (duration: 01m 06s)
* 16:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1025.eqiad.wmnet
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7f1de32f4b5788e92291a5448563bc61a9f561e2}}: Offer the DiscussionTools reply tool as opt-out setting at Wikimania wiki ([[phab:T284339|T284339]]) (duration: 01m 05s)
* 16:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1025.eqiad.wmnet
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e36f4d3dcc368f0afbce3649ce72f2135ab1c76f}}: DiscussionTools: Make newtopictool available to everyone on arwiki and cswiki ([[phab:T285724|T285724]]) (duration: 01m 04s)
* 15:21 elukey: restart zookeeper on conf2004 to pick up the -javaagent setting for the prometheus exporter
* 18:09 urbanecm@deploy1002: Synchronized debug.json: {{Gerrit|Idef64e72}} (duration: 01m 29s)
* 15:06 moritzm: installing jquery security updates on stretch
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: reimage
* 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 17:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: reimage
* 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:45 moritzm: reimaging mx2001 to bullseye [[phab:T286911|T286911]]
* 14:48 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:47 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:28 moritzm: installing ldap-replica1003/1004
* 16:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:03 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
* 16:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:03 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
* 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15537 and previous config saved to /var/cache/conftool/dbconfig/20210426-133922-root.json
* 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15536 and previous config saved to /var/cache/conftool/dbconfig/20210426-133905-root.json
* 16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: for zookeeper migration
* 16:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: for zookeeper migration
* 15:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15535 and previous config saved to /var/cache/conftool/dbconfig/20210426-132533-root.json
* 15:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15534 and previous config saved to /var/cache/conftool/dbconfig/20210426-132417-root.json
* 15:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15533 and previous config saved to /var/cache/conftool/dbconfig/20210426-132402-root.json
* 15:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:14 moritzm: installing ldap-replica2005/2006
* 15:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15532 and previous config saved to /var/cache/conftool/dbconfig/20210426-131029-root.json
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15531 and previous config saved to /var/cache/conftool/dbconfig/20210426-130913-root.json
* 15:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15530 and previous config saved to /var/cache/conftool/dbconfig/20210426-130858-root.json
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 37 hosts
* 12:57 moritzm: installing gst-plugins-base1.0 security updates
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.remove-downtime for 37 hosts
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15529 and previous config saved to /var/cache/conftool/dbconfig/20210426-125526-root.json
* 15:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-update-tendril (exit_code=0)
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15528 and previous config saved to /var/cache/conftool/dbconfig/20210426-125409-root.json
* 15:11 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-update-tendril
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15527 and previous config saved to /var/cache/conftool/dbconfig/20210426-125354-root.json
* 15:10 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15526 and previous config saved to /var/cache/conftool/dbconfig/20210426-124141-marostegui.json
* 15:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15525 and previous config saved to /var/cache/conftool/dbconfig/20210426-124022-root.json
* 15:07 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15524 and previous config saved to /var/cache/conftool/dbconfig/20210426-123020-marostegui.json
* 15:06 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
* 12:28 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 15:05 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
* 12:27 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17271 and previous config saved to /var/cache/conftool/dbconfig/20210914-150458-marostegui.json
* 12:24 Amir1: cleaning watchlist of QuickStatementsBot in wikidatawiki
* 15:03 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 15:00 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 12:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 14:58 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 12:00 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Enable writes on es4 [[phab:T279281|T279281]] (duration: 00m 56s)
* 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17270 and previous config saved to /var/cache/conftool/dbconfig/20210914-145522-marostegui.json
* 11:57 marostegui: Restart es4 primary master - [[phab:T279281|T279281]]
* 14:54 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 11:55 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Disable writes on es4 [[phab:T279281|T279281]] (duration: 00m 56s)
* 14:54 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 11:51 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:53 jelto@cumin2002: END (ERROR) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=97)
* 11:49 hashar@deploy1002: Finished deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki (duration: 00m 13s)
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17269 and previous config saved to /var/cache/conftool/dbconfig/20210914-145324-marostegui.json
* 11:49 hashar@deploy1002: Started deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki
* 14:52 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 11:46 Urbanecm: EU B&C done
* 14:49 jelto@cumin2002: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=99)
* 11:45 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/Dialog.js: {{Gerrit|a347517f906b07b2503ae559c6cc714e1c50e4aa}}: Fix suggested values not being shown when the params type isnt specified ([[phab:T280688|T280688]]) (duration: 00m 57s)
* 14:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:31 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:681137{{!}}Revert "Set wgPageImagesAPIDefaultLicense to 'any' for wikidata"]] (duration: 00m 57s)
* 14:49 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 11:30 aborrero@cumin1001: START - Cookbook sre.dns.netbox
* 14:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2b5b640ad28bce1df20c2ca82654996d9cfc7630}}: Enable ContentTranslation as a default tool for 11 Wikipedias ([[phab:T279422|T279422]]) (duration: 00m 57s)
* 14:46 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 10:58 effie: restarting php-fpm in mw* clusters in codfw to pick up php7.2 update
* 14:46 jelto@cumin2002: MediaWiki read-only period ends at: 2021-09-14 14:46:30.570035
* 10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:682575{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:682575{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 10:38 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1004.wikimedia.org
* 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 10:37 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Setup wmgUseFooterCodeOfConductLink for later usage (duration: 00m 57s)
* 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 10:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 10:26 effie: upgrading mw* servers  php7.2  in codfw
* 14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 10:25 marostegui: Deploy schema change on s4 codfw, lag will appear [[phab:T276292|T276292]]
* 14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 10:24 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wmgUseFooterTechCodeOfConductLink instead of wmgUseFooterCodeOfConductLink (duration: 00m 57s)
* 14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 10:24 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1004.wikimedia.org
* 14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 10:22 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseFooterTechCodeOfConductLink (duration: 00m 59s)
* 14:43 jelto@cumin2002: MediaWiki read-only period starts at: 2021-09-14 14:43:48.272827
* 10:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1003.wikimedia.org
* 14:43 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 10:18 moritzm: installing systemd updates from buster 10.9 point release
* 14:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: DC switchover
* 10:07 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1003.wikimedia.org
* 14:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: DC switchover
* 10:00 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 14:39 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 09:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2006.wikimedia.org
* 14:39 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 09:42 moritzm: installing clamav security updates on otrs1001
* 14:34 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 09:38 godog: reboot ms-be1062, kernel backtrace saved
* 14:32 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 09:26 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
* 14:30 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 09:26 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2006.wikimedia.org
* 14:24 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 09:24 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2005.wikimedia.org
* 14:22 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 09:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 14:22 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 09:15 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:13 jayme: imported etcd-mirror_0.0.6-2 to buster-wikimedia
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005.wikimedia.org
* 14:10 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Avoid warning about undefined $wgFileBlacklist ([[phab:T290640|T290640]]) (duration: 01m 32s)
* 09:07 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2005failoid1002.wikimedia.org
* 13:44 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
* 09:04 jayme: imported etcd-mirror_0.0.6-1 to buster-wikimedia
* 13:43 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
* 08:55 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005failoid1002.wikimedia.org
* 13:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names (duration: 00m 14s)
* 08:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|f01a6dab70f74938dd51668809a181a8f551b6c8}}: GrowthExperiments: Enable community configuration on testwiki ([[phab:T274520|T274520]]) (duration: 00m 57s)
* 13:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|88da8226823e59d1d19db9aeca3b5a5140c0c60c}}: GrowthExperiments: Do not enable community configuration outside of beta wikis ([[phab:T274520|T274520]]) (duration: 00m 59s)
* 13:27 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
* 08:28 moritzm: update debmonitor to 0.2.9 on remaining hosts [[phab:T281090|T281090]]
* 13:27 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
* 08:13 moritzm: installing lxml security updates on stretch
* 13:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@1ebdca4]: (no justification provided) (duration: 00m 15s)
* 07:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 13:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@1ebdca4]: (no justification provided)
* 07:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 12:32 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 12:32 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 07:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 12:29 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 07:32 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 12:29 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 07:24 moritzm: installing pear security updates
* 12:19 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 07:09 moritzm: removed rawdog from bullseye-wikimedia, needs Py2 [[phab:T280989|T280989]]
* 12:19 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 06:24 elukey: reboot an-coord1001 to pick up kernel security settings (after reimage)
* 12:17 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1158 to dbctl, depooled, [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15521 and previous config saved to /var/cache/conftool/dbconfig/20210426-054700-marostegui.json
* 12:17 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
* 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
* 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 03:43 kart_: Updated cxserver to 2021-04-21-044024-production ([[phab:T279045|T279045]])
* 10:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
* 03:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 03:37 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.20 (duration: 01m 48s)
* 03:32 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:47 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.19 (duration: 04m 13s)
* 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 09:38 hashar@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.23 (duration: 70m 39s)
* 09:29 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 09:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 09:09 Emperor: swift rebalance to remove h/w faulty host ms-be2045 [[phab:T290881|T290881]]
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:47 moritzm: installing testvm2002
* 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 08:27 hashar@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.23
* 08:25 godog: poweroff ms-be2045 and set it as failed in netbox - [[phab:T290881|T290881]]
* 08:24 hashar: train: applied security patches for 1.37.0-wmf.23  # [[phab:T281164|T281164]]
* 08:05 godog: wipe non-os partitions from ms-be2045 - [[phab:T290881|T290881]]
* 07:50 vgutierrez: update acme-chief to version 0.31 on acmechief hosts - [[phab:T290249|T290249]]
* 04:47 eileen: civicrm revision changed from {{Gerrit|1f071f6c6c}} to {{Gerrit|e6bf81d99c}}, config revision is {{Gerrit|23eda8ba3a}}
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:07 James_F: wmf/1.37.0-wmf.23 was branched at {{Gerrit|ea72c9b690c2159a12beec2f518b61cc499ed521}} for [[phab:T281164|T281164]]
* 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2021-04-25 ==
== 2021-09-13 ==
* 15:23 Amir1: sudo -u list /var/lib/mailman/bin/change_pw -l wikica-l -p $(pwgen -c1 -s 12) ([[phab:T281066|T281066]])
* 23:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:45 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T290759|T290759]]: Undeploy VipsScaler: III – Don't set wmgUseVips, now ignored (duration: 00m 58s)
* 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:41 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T290759|T290759]]: Undeploy VipsScaler: II – Don't load regardless of config (duration: 00m 58s)
* 19:52 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T290759|T290759]] Undeploy VipsScaler: I – Disable on all wikis (duration: 00m 57s)
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:59 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript resetAuthenticationThrottle.php --wiki=<nowiki>{</nowiki>cswiki,cswikiversity<nowiki>}</nowiki> --signup --ip=185.47.223.49 # [[phab:T290809|T290809]]
* 18:58 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|9db1d1ac938ca053c82fed88c8b6e75f97a52416}}: Add throttle rule for Czech wiki course ([[phab:T290809|T290809]]) (duration: 00m 58s)
* 18:29 ryankemper: [Cirrus] `eqiad` fully recovered (100% of shards), `codfw` at 99.816%. `codfw` is getting held up by recovery of `enwiki` shards which tend to be quite large
* 18:25 razzi: reenable replication on dbstore1007 for [[phab:T290841|T290841]]
* 18:16 cwhite: apply high log volume from ES mitigations to deprecated inputs
* 18:13 razzi: razzi@dbstore1007:~$ sudo systemctl restart mariadb@s3.service for [[phab:T290841|T290841]]
* 18:05 razzi: sudo systemctl restart mariadb@s2.service
* 17:48 ryankemper: [Cirrus] `eqiad` is at 99.13% shards recovered and `codfw` is at 98.83%
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 17:17 ryankemper: [Cirrus] `enwiki` searches appear to be working now. `production-search-eqiad` is at 93.5% recovered shards, `production-search-codfw` is at 95.3% recovered
* 16:57 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
* 16:18 legoktm@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main
* 16:16 volans@cumin1001: conftool action : set/pooled=yes; selector: name=mw1414.*
* 16:08 volans@cumin1001: conftool action : set/pooled=no; selector: name=mw1414.*
* 16:06 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw1414.eqiad.wmnet
* 15:54 moritzm: filtered mx2001 on the routers for reimage [[phab:T286911|T286911]]
* 15:43 vgutierrez: update acme-chief to version 0.31 on acmechief-test hosts - [[phab:T290249|T290249]]
* 15:40 vgutierrez: upload acme-chief 0.31 to apt.wm.o (buster) - [[phab:T290249|T290249]]
* 15:32 jelto: Traffic: depool codfw from user traffic
* 15:26 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 15:25 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
* 15:25 volans@cumin1001: START - Cookbook sre.experimental.reimage for host mw1414.eqiad.wmnet
* 15:20 Emperor: rebooting ms-be2045 to see if that brings the disk back properly [[phab:T290881|T290881]]
* 15:13 jelto@cumin2002: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async
* 15:13 legoktm: (cotd.) box-constraints{{!}}similar-users{{!}}termbox{{!}}thanos-query{{!}}thanos-swift{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero)
* 15:13 rzl: (contd.) box-constraints{{!}}similar-users{{!}}termbox{{!}}thanos-query{{!}}thanos-swift{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero)
* 15:12 jelto@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}eventstreams-internal{{!}}kartotherian{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}shellbox{{!}}shell
* 15:02 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 15:02 topranks: Restarting unused line-card FPC 1 in cr2-codfw in attempt to clear alarm.
* 14:56 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 14:44 herron: drained mx2001 mail queue to mx1001 [[phab:T286911|T286911]]
* 14:38 dcausse: restarting wdqs-updater.service on all wdqs servers
* 14:21 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 14:20 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
* 14:13 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
* 14:13 legoktm: (cotd.) ternal, eventgate-main, wikifeeds, eventstreams-internal, eventgate-analytics-external: codfw => eqiad
* 14:12 jelto@cumin2002: Switching services echostore, termbox, cxserver, eventstreams, search, ores, mathoid, schema, push-notifications, thanos-swift, wdqs, sessionstore, restbase, wdqs-internal, apertium, eventgate-analytics, citoid, api-gateway, restbase-async, proton, linkrecommendation, thanos-query, shellbox, kartotherian, mobileapps, recommendation-api, zotero, similar-users, shellbox-constraints, eventgate-logging-ex
* 14:12 jelto@cumin2002: START - Cookbook sre.switchdc.services.01-switch-dc
* 14:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 14:05 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 14:03 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3002.esams.wmnet
* 13:51 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3002.esams.wmnet
* 13:50 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3001.esams.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3001.esams.wmnet
* 13:36 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2002.codfw.wmnet
* 13:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2002.codfw.wmnet
* 13:20 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2001.codfw.wmnet
* 13:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2001.codfw.wmnet
* 12:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:03 volans@cumin1001: START - Cookbook sre.dns.netbox
* 11:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:26 kostajh: European mid-day backport window deploys done
* 11:24 kharlan@deploy1002: Synchronized wmf-config: Config: [[gerrit:713553{{!}}WikimediaEvents: Remove UnderstandingFirstDay config]] (duration: 00m 59s)
* 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 10:43 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 10:15 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=93) for host mw1414.eqiad.wmnet
* 09:33 volans: restarting tcpircbot-logmsgbot on alert1001, not relying messages
* 09:18 elukey: upgrade rsyslog* on ml-serve* nodes to 8.1901.0-1+wmf2
* 09:16 godog: swift eqiad-prod: add weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 09:11 moritzm: reimaging sretest1002
* 09:11 elukey: upload rsyslog* 8.1901.0-1+wmf2 to buster-wikimedia component/rsyslog-k8s - [[phab:T277739|T277739]]
* 08:16 godog: bump +100G prometheus/ops codfw


== 2021-04-24 ==
== 2021-09-12 ==
* 22:24 bstorm: Rebooting labstore1007 from ilo after crash
* 18:33 vgutierrez: restart varnish-fe on cp3061, cp3063 and cp3065
* 18:29 vgutierrez: restart varnish on cp3055
* 18:26 vgutierrez: restart varnish on cp3057
* 04:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2021-04-23 ==
== 2021-09-11 ==
* 21:36 foks: removing 1 file for legal compliance
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|27814b8eaacb5ba2fee1b6167a36ea14356a1ecf}}: testwiki: Fully remove securepoll-related groups ([[phab:T290808|T290808]]) (duration: 00m 57s)
* 20:15 mutante: [apt1001:~] $ sudo -i reprepro -C main includedeb bullseye-wikimedia /home/dzahn/rawdog_2.23-2_all.deb ([[phab:T280989|T280989]])
* 18:35 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki <nowiki>{</nowiki>electionadmin,electcomm<nowiki>}</nowiki> # [[phab:T290808|T290808]]
* 19:41 mutante: [apt1001:~] $ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy - copy envoy package from buster to bullseye [[phab:T280989|T280989]]
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|908bbf35235ea4129795dfbf4c0e646440152e18}}: Revert "test: Add electcomm and electionadmin groups" ([[phab:T290808|T290808]]) (duration: 00m 58s)
* 19:09 ebernhardson: closing duplicate/wrong cluster indices in cloudelastic
* 17:02 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
* 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:32 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:59 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 14:59 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 14:25 moritzm: revert back bullseye image to daily build from last week (to rule out potential reimage issue)
* 13:33 elukey: roll restart of all thanos-swift proxies to pick up new ML account - [[phab:T280773|T280773]]
* 12:50 jbond42: upload new debmonitor-client packages
* 11:50 moritzm: installing perf updates from Buster 10.9 point release
* 10:06 moritzm: installing Linux 4.19.181 updates from Buster 10.9 point release (no reboots, just updating the packages)
* 09:54 moritzm: installing xen security updates
* 09:49 moritzm: installing xorg-server security updates
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15512 and previous config saved to /var/cache/conftool/dbconfig/20210423-093723-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15511 and previous config saved to /var/cache/conftool/dbconfig/20210423-092220-root.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15510 and previous config saved to /var/cache/conftool/dbconfig/20210423-090716-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15509 and previous config saved to /var/cache/conftool/dbconfig/20210423-085212-root.json
* 08:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
* 08:21 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
* 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1021.eqiad.wmnet
* 08:13 moritzm: upgrading d-i image for bullseye to RC1 release [[phab:T275873|T275873]]
* 08:12 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1021.eqiad.wmnet
* 08:12 moritzm: upgrading d-i image for bullseye to RC1 release
* 08:12 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1019.eqiad.wmnet
* 07:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1019.eqiad.wmnet
* 07:56 jynus: deleting db1156 s2 database and reloading it from logical backups [[phab:T280492|T280492]]
* 07:22 Amir1: removing junk bounced email addresses from yahoo from all mailing lists
* 05:40 marostegui: Stop db1079 to clone db1158 (lag will appear on s7 on wiki replicas)
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to clone db1158 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15506 and previous config saved to /var/cache/conftool/dbconfig/20210423-053907-marostegui.json


== 2021-04-22 ==
== 2021-09-10 ==
* 17:26 marostegui: Stop mysql on tendril/dbtree database
* 21:28 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 16:33 volker-e@deploy1002: Finished deploy [design/style-guide@e914e8a]: Deploy design/style-guide: {{Gerrit|e914e8a}} icons: Add 'share' icon (#455) (duration: 00m 06s)
* 21:27 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 16:32 volker-e@deploy1002: Started deploy [design/style-guide@e914e8a]: Deploy design/style-guide: {{Gerrit|e914e8a}} icons: Add 'share' icon (#455)
* 21:21 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 13:23 marostegui: Tendril and dbtree are up but on a degraded status (slow reponse)
* 20:46 jhuneidi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:19 marostegui: Tendril and dbtree are down at the moment
* 20:44 jhuneidi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 12:46 Urbanecm: Start server-side upload for 2 video files ([[phab:T280763|T280763]], [[phab:T280524|T280524]])
* 20:42 jhuneidi@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:31 marostegui: Restart mysql on db1115 (tendril/dbtree will fail)
* 18:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 04:55 eileen: civicrm revision changed from {{Gerrit|42ca3cf65a}} to {{Gerrit|33a63d5789}}, config revision is {{Gerrit|cf07e7ba0b}}
* 18:08 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 02:47 krinkle@deploy1002: Finished deploy [integration/docroot@010e445]: (no justification provided) (duration: 00m 09s)
* 17:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
* 02:47 krinkle@deploy1002: Started deploy [integration/docroot@010e445]: (no justification provided)
* 17:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
* 01:34 eileen: civicrm revision changed from {{Gerrit|35a8dd33ba}} to {{Gerrit|42ca3cf65a}}, config revision is {{Gerrit|cf07e7ba0b}}
* 16:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
* 00:28 legoktm: legoktm@deneb:/var/cache/pbuilder/aptcache$ sudo rm -rf * # Cleaned up 8GB more
* 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
* 00:27 legoktm: legoktm@deneb:/var/cache/apt/archives$ sudo rm -rf * # cleaned up 6GB
* 16:14 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 00:03 legoktm: subscribed all list admins to the listadmins@ mailing list ([[phab:T280716|T280716]])
* 16:03 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 15:39 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 15:27 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 14:48 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:43 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:31 XioNoX: push pfw policies - [[phab:T290611|T290611]]
* 09:07 mutante: planet - deleted all state files for all languages, running fresh update via systemctl start for all languages after proxy changes ([[phab:T285251|T285251]])
* 08:37 jynus: upgrade and restart db2139
* 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:58 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-workers - [[phab:T289766|T289766]]
* 07:57 moritzm: installing ntfs-3g security updates
* 07:46 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:45 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:25 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-staging - [[phab:T289766|T289766]]
* 07:19 jayme: importes rsyslog 8.1901.0-1~bpo9+wmf2 to stretch-wikimedia - [[phab:T289766|T289766]]
* 06:56 effie: disable puppet on deploy1002 and mw2254
* 06:29 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 06:27 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 06:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 06:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 06:02 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2280.codfw.wmnet
* 05:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:56 elukey: powercycle mw2280 - no tty available in mgmt, no ssh, host frozen
* 05:55 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet
* 05:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:45 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:12 marostegui: Repool clouddb1017:3311
* 05:12 marostegui: Repool clouddb1013:3311
* 04:49 marostegui: Depool clouddb1013:3311
* 04:49 marostegui: Depool clouddb1017:3311
* 02:52 eileen: civicrm revision changed from {{Gerrit|83f514f693}} to {{Gerrit|1f071f6c6c}}, config revision is {{Gerrit|23eda8ba3a}}
* 00:35 tgr: Deployed patch for [[phab:T290692|T290692]]


== 2021-04-21 ==
== 2021-09-09 ==
* 23:58 eileen: tools revision changed from {{Gerrit|3d950fffbd}} to {{Gerrit|c26a8c0cb6}}
* 23:07 brennen: no takers on patches, ending backport & config training window.
* 23:49 legoktm: made myself and Amir1 list admins for the listadmins@lists.wikimedia.org mailing list
* 21:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1017.eqiad.wmnet
* 21:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1017.eqiad.wmnet
* 21:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1016.eqiad.wmnet
* 21:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.eqiad.wmnet
* 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host planet1003.eqiad.wmnet
* 20:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:40 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:48 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:37 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:48 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:46 mutante: creating a ganeti VM to test bullseye install
* 19:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet1003.eqiad.wmnet
* 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 bstorm: manually kicking off a run of update-openstack-mirror on sodium to capture an upstream package update
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:15 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bc4f20437868b39ae2cc4eac8735ecb8bcd93157}}: Growth: Push 44 wikis out of dark mode ([[phab:T289680|T289680]]) (duration: 00m 57s)
* 18:46 Urbanecm: Morning B&C done
* 18:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]; 3/3) (duration: 00m 57s)
* 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikibaseMediaInfo/: {{Gerrit|f831d16e42e712832d683233a5b21ad59f7c73b3}}: Make the logistic regression image search default ([[phab:T271799|T271799]]) (duration: 00m 58s)
* 18:22 urbanecm@deploy1002: Synchronized wmf-config/config/: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]; 2/3) (duration: 01m 01s)
* 18:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f6d076a69607172475a86ba935a273e7519108d1}}: Update $wgGEHomepageNewAccountVariants ([[phab:T278123|T278123]]) (duration: 00m 58s)
* 18:21 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]; 1/3) (duration: 00m 58s)
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1ae5ca5467fad7bfdae8aa94b241fe6c048ab8e5}}: Set wgGEMentorshipMigrationStage to WRITE_BOTH/READ_NEW everywhere ([[phab:T279853|T279853]]) (duration: 00m 59s)
* 18:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e252de0482c60e87e06d866006bb9ceb186af6cf}}: eswiki: Push Growth features out of dark mode ([[phab:T278235|T278235]]) (duration: 01m 00s)
* 18:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 17:43 jynus: deploy grant changes on m5 backup sources (db1117 and db2078) [[phab:T278614|T278614]]
* 18:20 urbanecm@deploy1002: sync-file aborted: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]) (duration: 00m 05s)
* 15:54 legoktm: [[phab:T280744|T280744]]: legoktm@lists1001:~$ sudo chmod 644 /etc/aliases
* 18:18 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
* 15:15 Urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php # [[phab:T279853|T279853]]
* 18:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15503 and previous config saved to /var/cache/conftool/dbconfig/20210421-151526-root.json
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:02 moritzm: installing jquery security updates on buster
* 18:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15502 and previous config saved to /var/cache/conftool/dbconfig/20210421-150023-root.json
* 18:16 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15501 and previous config saved to /var/cache/conftool/dbconfig/20210421-144519-root.json
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15500 and previous config saved to /var/cache/conftool/dbconfig/20210421-143015-root.json
* 18:12 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/initWikiConfig.php --phab=[[phab:T290582|T290582]] {{!}} tee ~/initwikiconfig.out # [[phab:T290582|T290582]]
* 14:25 jbond42: upload new version of debmonitor-client to apt
* 18:11 urbanecm: Run extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments for wikis in P17258 ([[phab:T290582|T290582]])
* 13:54 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=fawiki # [[phab:T279853|T279853]]
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:39 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiki # [[phab:T279853|T279853]]
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/config: no-op: {{Gerrit|76c51f2753aed9dc8e06b63de6657c3c94371a3c}}: Standardize indentation in several .yaml files (duration: 00m 58s)
* 13:01 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
* 17:29 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 12:21 moritzm: installing failoid2002
* 17:28 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 17:28 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 17:26 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 11:49 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:25 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 11:46 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 17:22 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 11:32 awight: EU backport window complete
* 17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 11:31 moritzm: installing failoid1002
* 17:21 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 11:29 awight@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents: Backport: [[gerrit:681334{{!}}Send 0 edits userEditCountBucket for anons (T210106)]] (duration: 00m 59s)
* 17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
* 10:41 jbond42: switch debmonitor-client to cfssl (second try)
* 17:20 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 10:37 jbond42: upload golang-cfssl packages for jessi and stretch
* 17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 10:33 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid1002.eqiad.wmnet
* 17:14 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-09-09 17:14:12.502162
* 10:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1002.eqiad.wmnet
* 17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 10:23 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid1002.eqiad.wmnet
* 17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 10:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1002.eqiad.wmnet
* 17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 10:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid2002.codfw.wmnet
* 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 10:21 hnowlan: rebooting eventlog1002 for kernel update
* 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 10:06 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid2002.codfw.wmnet
* 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 09:56 jbond42: switch debmonitor-clients to use cfssl
* 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15496 and previous config saved to /var/cache/conftool/dbconfig/20210421-093109-root.json
* 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15495 and previous config saved to /var/cache/conftool/dbconfig/20210421-091605-root.json
* 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 09:08 elukey: upgrade hue on an-tool1009 to 4.9
* 17:12 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 09:05 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 17:12 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-09-09 17:12:27.974410
* 09:05 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 17:12 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 09:03 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2280.codfw.wmnet,service=nginx
* 17:08 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15494 and previous config saved to /var/cache/conftool/dbconfig/20210421-090100-root.json
* 17:07 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
* 17:07 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 17:04 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 17:04 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 16:58 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 08:56 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 08:55 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 16:57 jelto: start cookbook sre.switchdc.mediawiki eqiad codfw --live-test this will generate some additional SAL logs here
* 08:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
* 16:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 08:53 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:52 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 16:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 08:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
* 16:10 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 08:50 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 10s)
* 16:00 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 08:50 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 15:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
* 15:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
* 15:28 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:719610{{!}}pipeline: add comment redirecting to correct file]] (duration: 00m 59s)
* 08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
* 15:24 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15493 and previous config saved to /var/cache/conftool/dbconfig/20210421-084555-root.json
* 14:47 mutante: planet - deleting all state and lock files for the "en" feeds ([[phab:T285251|T285251]] [[phab:T289984|T289984]])
* 08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
* 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2002.wikimedia.org
* 08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
* 14:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
* 08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
* 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
* 14:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
* 14:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
* 14:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 08:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
* 14:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
* 13:48 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mx2002.wikimedia.org
* 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
* 13:11 mutante: planet1002 - re-enabling disabled puppet
* 08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
* 13:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 08:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
* 13:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
* 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
* 13:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
* 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
* 13:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
* 13:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
* 12:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
* 10:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
* 07:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
* 10:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 07:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
* 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
* 10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
* 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
* 10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
* 07:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
* 10:47 topranks: Removing peering to old IPs of AS139931 (BSCCL) at Equinix Singapore (cr3-eqsin).
* 07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
* 10:45 topranks: Removing peering to AS24218 at Equinix Singapore (cr3-eqsin) - network no longer uses this ASN.
* 07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
* 10:22 volans: upgrading spicerack on cumin1001
* 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
* 10:20 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
* 10:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
* 09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
* 07:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
* 09:47 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
* 06:49 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:46 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 06:49 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:37 godog: swift eqiad add ms-be10[64-67] with initial weight - [[phab:T290546|T290546]]
* 06:42 elukey: upload hue_4.9.0-2+deb10u1 to buster-wikimedia
* 09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=eqiad
* 06:11 marostegui: Stop MySQL on db1074 to clone db1156 (there will be lag in s2 in wiki replicas) [[phab:T258361|T258361]]
* 09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone db1156 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15491 and previous config saved to /var/cache/conftool/dbconfig/20210421-061019-marostegui.json
* 09:15 volans: rebooting sretest1001 to test ipmi reboot via spicerack
* 06:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
* 09:15 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
* 06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
* 09:15 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
* 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
* 09:13 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
* 09:09 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1086.eqiad.wmnet
* 08:59 godog: move swift traffic fully to codfw to rebalance eqiad - [[phab:T287539|T287539]]
* 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1086.eqiad.wmnet
* 08:59 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
* 00:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 08:58 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=codfw
* 00:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 08:56 volans: upgrading spicerack on cumin2002 to test the new release
* 00:15 ryankemper: [WDQS] Pooled `wdqs1003`
* 08:50 volans: uploaded spicerack_0.0.59 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 00:14 ryankemper: [WDQS] Pooled `wdqs2008`
* 08:23 jelto: run ansible change 719041 on gitlab1001
* 00:07 ryankemper: `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1006.eqiad.wmnet`
* 08:13 jelto: run ansible change 719041 on gitlab2001
* 00:04 ryankemper: [WDQS] pooled `wdqs1004`
* 07:07 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1002.eqiad.wmnet
* 06:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1002.eqiad.wmnet
* 04:37 ryankemper: [WDQS] Dispatched e-mail to the banned user agent (dailymotion)
* 03:57 ryankemper: [WDQS] Dispatched e-mail to WDQS public mailing list informing them the outage is over; all that's left is the e-mail to the banned UA
* 03:47 ryankemper: [WDQS] Restarting `wdqs-blazegraph` on `wdqs[2001-2008].codfw.wmnet`; if banning the dailymotion UA was sufficient then servers should come back up healthy and not drop back into deadlock
* 03:43 ryankemper: [WDQS] Running puppet agent on `wdqs[2001-2008].codfw.wmnet` to roll out https://gerrit.wikimedia.org/r/719753
* 03:29 ryankemper: [WDQS] There's no clear indication of them being a culprit, but by far the most common user agent is a dailymotion VideocatalogTopic UA (see https://logstash.wikimedia.org/goto/51f238e9010d0220e5d33c6c210be93e)
* 03:12 bstorm: attempting to start replication on clouddb1017 s1 [[phab:T290630|T290630]]
* 03:11 bstorm: stopping and restarting mariadb on clouddb1017 s1
* 03:04 ryankemper: [WDQS] Dispatched email to Wikidata public mailing list about reduced service availability
* 02:36 ryankemper: [WDQS] https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&from=1631152574841&to=1631154942992 shows the availability pattern, anywhere we see missing data (null) represents time that blazegraph was locked up and therefore unable to report metrics
* 02:34 ryankemper: [WDQS] For context I glanced at `ryankemper@cumin1001:~$ sudo -E cumin 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki>' 'sudo systemctl status wdqs-blazegraph'` before doing the aforementioned restarts and they'd all last restarted between 25-28 minutes ago
* 02:33 ryankemper: [WDQS] Restarting `wdqs-blazegraph` across all of `wdqs2*`
* 00:50 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Don't set default  to Score (try #2) (duration: 00m 58s)
* 00:48 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/Score/includes/Score.php: Use the 'score' Shellbox if configured ([[phab:T290193|T290193]]) (duration: 00m 57s)
* 00:46 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/includes/shell/CommandFactory.php: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand ([[phab:T290193|T290193]]) (duration: 00m 58s)
* 00:45 legoktm@deploy1002: sync-file aborted: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand ([[phab:T290193|T290193]] (duration: 00m 07s)
* 00:15 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove putenv() for GDFONTPATH (duration: 00m 58s)


== 2021-04-20 ==
== 2021-09-08 ==
* 23:46 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|73544ccb40d9687b54c039aceb05cd033901d86f}}: urwiki: Enable Growth team features in stealth mode ([[phab:T280067|T280067]]) (duration: 00m 57s)
* 22:34 ryankemper: WDQS] [[phab:T280247|T280247]] Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/717649
* 23:44 urbanecm@deploy1002: Synchronized wmf-config/config/urwiki.yaml: {{Gerrit|73544ccb40d9687b54c039aceb05cd033901d86f}}: urwiki: Enable Growth team features in stealth mode ([[phab:T280067|T280067]]) (duration: 00m 57s)
* 22:24 ryankemper: WDQS] [[phab:T280247|T280247]] Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/714623
* 23:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|73544ccb40d9687b54c039aceb05cd033901d86f}}: urwiki: Enable Growth team features in stealth mode ([[phab:T280067|T280067]]) (duration: 00m 58s)
* 21:55 ryankemper: [WDQS] [[phab:T280247|T280247]] Purged varnish to make sure change took effect: `echo 'https://query-preview.wikidata.org/' {{!}} mwscript purgeList.php` and `echo 'https://query.wikidata.org/' {{!}} mwscript purgeList.php` on `mwmaint1002`
* 23:38 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=urwiki GrowthExperiments # [[phab:T280067|T280067]]
* 21:53 ryankemper: [WDQS] [[phab:T280247|T280247]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719502 and ran puppet-agent on `miscweb*`
* 23:38 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|314367bca6e924136704911b55fd3e2c929fa704}}: elwiki: Enable Growth team features in stealth mode ([[phab:T280172|T280172]]; 3/3) (duration: 00m 56s)
* 20:49 eileen: civicrm revision changed from {{Gerrit|593d01f4fc}} to {{Gerrit|83f514f693}}, config revision is {{Gerrit|23eda8ba3a}}
* 23:36 urbanecm@deploy1002: Synchronized wmf-config/config/elwiki.yaml: {{Gerrit|314367bca6e924136704911b55fd3e2c929fa704}}: elwiki: Enable Growth team features in stealth mode ([[phab:T280172|T280172]]; 2/3) (duration: 00m 57s)
* 20:41 legoktm: Successfully published image docker-registry.discovery.wmnet/php7.2-fpm-multiversion-base:1.0.2
* 23:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|314367bca6e924136704911b55fd3e2c929fa704}}: elwiki: Enable Growth team features in stealth mode ([[phab:T280172|T280172]]; 1/3) (duration: 00m 57s)
* 19:25 Krinkle: krinkle@mw1369 Running some benchmarks in Eqiad on load.php
* 23:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
* 18:27 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|6bcbe61f9a89086b775d84a81d55a7587cf26780}}: Italian Wikipedia is now a group 1 wiki ([[phab:T286664|T286664]]; 2/2) (duration: 00m 58s)
* 23:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=elwiki GrowthExperiments # [[phab:T280172|T280172]]
* 18:26 urbanecm@deploy1002: Synchronized dblists/: {{Gerrit|6bcbe61f9a89086b775d84a81d55a7587cf26780}}: Italian Wikipedia is now a group 1 wiki ([[phab:T286664|T286664]]; 1/2) (duration: 00m 58s)
* 23:31 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|425d77b73f48b3e16a5aa2c0086f292d370cd17e}}: cawiki: Enable Growth team features in stealth mode ([[phab:T280673|T280673]]; 3/3) (duration: 00m 57s)
* 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bbefce6a3778f159ad68587c830dff4a1da0c792}}: Growth: Remove config that moved on-wiki ([[phab:T290295|T290295]]) (duration: 00m 58s)
* 23:28 Urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist growthexperiments sql.php --cluster=extension1 /srv/mediawiki/php-1.37.0-wmf.1/extensions/GrowthExperiments/maintenance/schemas/mysql/growthexperiments_mentee_data.sql # [[phab:T279587|T279587]]
* 18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|950a377e5ba6f5d318135e31b36334532d9ae71b}}: Stop setting $wgAbuseFilterParserClass ([[phab:T239990|T239990]]) (duration: 00m 58s)
* 23:28 urbanecm@deploy1002: Synchronized wmf-config/config/cawiki.yaml: {{Gerrit|425d77b73f48b3e16a5aa2c0086f292d370cd17e}}: cawiki: Enable Growth team features in stealth mode ([[phab:T280673|T280673]]; 2/3) (duration: 00m 57s)
* 17:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2004.codfw.wmnet
* 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|425d77b73f48b3e16a5aa2c0086f292d370cd17e}}: cawiki: Enable Growth team features in stealth mode ([[phab:T280673|T280673]]; 1/3) (duration: 00m 57s)
* 16:53 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2004.codfw.wmnet
* 23:24 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=cawiki GrowthExperiments # [[phab:T280673|T280673]]
* 16:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2003.codfw.wmnet
* 23:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2003.codfw.wmnet
* 23:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2001.codfw.wmnet
* 23:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
* 16:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|796e23c87ccfc48334ab932e13aab4f0ec746bbd}}: updateMenteeData.php: Make it possible to force update (duration: 00m 58s)
* 23:03 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
* 16:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:719524{{!}}Turn off jQuery migrate on wikisource wikis (T280944)]] (duration: 00m 59s)
* 22:14 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2001.codfw.wmnet
* 22:10 robh@cumin1001: START - Cookbook sre.dns.netbox
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
* 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 16:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 16:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 20:52 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ruwiki # [[phab:T279853|T279853]]
* 16:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1020.wikimedia.org
* 16:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 20:41 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=viwiki # [[phab:T279853|T279853]]
* 16:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 20:36 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1020.wikimedia.org
* 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
* 20:36 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ukwiki # [[phab:T279853|T279853]]
* 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
* 20:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd[1017-1019].wikimedia.org
* 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
* 20:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=tewiki # [[phab:T279853|T279853]]
* 15:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 20:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=svwiki # [[phab:T279853|T279853]]
* 15:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
* 20:30 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=srwiki # [[phab:T279853|T279853]]
* 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
* 20:29 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=rowiki # [[phab:T279853|T279853]]
* 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
* 20:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hywiki # [[phab:T279853|T279853]]
* 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 20:22 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=huwiki # [[phab:T279853|T279853]]
* 14:57 marostegui: Retroactive: started to warm up eqiad databaes
* 20:21 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hrwiki # [[phab:T279853|T279853]]
* 14:57 moritzm: installing 4.19.194 kernels on stretch systems with 4.19.x (no reboots yet)
* 20:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hewiki # [[phab:T279853|T279853]]
* 14:54 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.2.3 ([[phab:T289802|T289802]])
* 20:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiktionary # [[phab:T279853|T279853]]
* 14:53 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
* 20:16 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd[1017-1019].wikimedia.org
* 14:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
* 20:15 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=euwiki # [[phab:T279853|T279853]]
* 14:33 moritzm: installing zeromq3 security updates
* 20:13 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=bnwiki # [[phab:T279853|T279853]]
* 13:50 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15 (duration: 06m 42s)
* 20:08 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:44 mbsantos@deploy1002: Started deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15
* 20:03 robh@cumin1001: START - Cookbook sre.dns.netbox
* 13:38 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.1.5 ([[phab:T289802|T289802]])
* 19:58 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:13 brennen: gitlab1001: downtiming alerts for 2.5 hours; upgrading to 14.0.10 ([[phab:T289802|T289802]])
* 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
* 12:45 brennen: gitlab: pausing all runners in preparation for upgrade to 14.0.10 ([[phab:T289802|T289802]])
* 19:28 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1016.wikimedia.org
* 11:57 moritzm: installing curl security updates on stretch
* 18:34 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=idwiki # [[phab:T279853|T279853]]
* 11:09 jbond: upload statograph_0.1.2
* 18:33 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.wikimedia.org
* 11:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/: {{Gerrit|4d1969d}}: {{Gerrit|1fbb8e9}}: MentorStore: Set wasPosted to true in command line mode ([[phab:T275773|T275773]]) (duration: 00m 59s)
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 17:26 XioNoX: boot cr1-codfw:fpc1 - [[phab:T277341|T277341]]
* 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 17:16 papaul: Adding a MPC7E to cr1-codfw
* 10:06 jelto: upgrade gitlab2001 to gitlab-ce=14.0.10-ce.0
* 16:32 arturo: merging change to core route firewall https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681316 ([[phab:T272587|T272587]])
* 10:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
* 16:15 andrewbogott: updating core routers config with https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681315
* 10:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
* 15:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host eventlog1003.eqiad.wmnet
* 09:38 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to wikimedia.org - [[phab:T210137|T210137]]
* 15:22 urbanecm@deploy1002: Synchronized docroot/noc/conf/debug.json: {{Gerrit|dc6647b9c674429c0811116e0caca7639b766e77}}: remove mwdebug1003 from list of debug servers ([[phab:T267248|T267248]]) (duration: 00m 58s)
* 09:29 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to codfw - [[phab:T210137|T210137]]
* 15:20 urbanecm@deploy1002: Synchronized debug.json: {{Gerrit|dc6647b9c674429c0811116e0caca7639b766e77}}: remove mwdebug1003 from list of debug servers ([[phab:T267248|T267248]]) (duration: 00m 57s)
* 09:09 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqiad - [[phab:T210137|T210137]]
* 15:14 hnowlan@cumin1001: START - Cookbook sre.ganeti.makevm for new host eventlog1003.eqiad.wmnet
* 07:45 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqsin/esams/ulsfo - [[phab:T210137|T210137]]
* 15:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:46 ryankemper: [WDQS] Manually running puppet-agent on `miscweb2002.codfw.wmnet,miscweb1002.eqiad.wmnet`
* 15:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 06:45 ryankemper: [WDQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719185 to rollback query.wikidata.org changes
* 14:59 volker-e@deploy1002: Finished deploy [design/style-guide@c4d8314]: Deploy design/style-guide: {{Gerrit|c4d8314}} “Components”: Fix “Buttons” active states (#460) (duration: 00m 07s)
* 02:59 eileen: civicrm revision changed from {{Gerrit|06ef98593f}} to {{Gerrit|593d01f4fc}}, config revision is {{Gerrit|5f004d94d7}}
* 14:58 volker-e@deploy1002: Started deploy [design/style-guide@c4d8314]: Deploy design/style-guide: {{Gerrit|c4d8314}} “Components”: Fix “Buttons” active states (#460)
* 00:00 legoktm: legoktm@lists1001:~$ sudo rm -rf /etc/mailman # cleanup as part of {{Gerrit|4869d91b0be}} / [[phab:T282303|T282303]]
* 14:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 14:38 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 14:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 14:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:34 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 14:30 moritzm: installing exim updates from Buster point release
* 14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:25 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a] (duration: 04m 56s)
* 14:25 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:24 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:22 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:20 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a]
* 14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:17 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a] (duration: 00m 07s)
* 14:17 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a]
* 14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a] (duration: 00m 03s)
* 14:16 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a]
* 14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
* 14:15 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
* 14:15 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
* 14:14 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
* 14:14 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a] (duration: 14m 50s)
* 14:11 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:04 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:01 jiji@cumin1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet,cluster=videoscaler
* 13:59 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a]
* 13:42 moritzm: upgrading mw1276 to PHP 7.2.34
* 13:40 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:40 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 13s)
* 13:40 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 13:38 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:36 otto@deploy1002: Finished deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint (duration: 05m 17s)
* 13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:33 moritzm: upgrading mw1261 to PHP 7.2.34
* 13:31 otto@deploy1002: Started deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint
* 13:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:26 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:22 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:21 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 13:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/includes/actions/RollbackAction.php: {{Gerrit|ccbfcf28a2f507ed40dcf7af748c30f581b5079f}}: Do not mark rollbacks as bot edits ([[phab:T280655|T280655]]) (duration: 00m 57s)
* 13:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 13:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:07 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
* 13:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
* 12:58 moritzm: reimaging cumin2002 to bullseye [[phab:T276589|T276589]]
* 12:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 12:51 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:49 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:47 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:42 moritzm: uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf1 to component/php72
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 to check its tables [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15483 and previous config saved to /var/cache/conftool/dbconfig/20210420-124118-marostegui.json
* 12:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
* 12:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
* 12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:18 CFisch_WMDE: European mid-day backport window done
* 12:05 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:681321{{!}}Add NS_PROJECT alias for azwiki (T280577)]] (duration: 00m 57s)
* 12:04 moritzm: drain ganeti5003
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:54 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/includes/CommentFormatter.php: Backport: [[gerrit:681153{{!}}CommentFormatter: Add ext-discussiontools-section class instead of overwriting (T280433)]] (duration: 00m 57s)
* 11:47 moritzm: failover ganeti master in eqsin to ganeti5001
* 11:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:38 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/VisualEditor/modules/ve-mw/ui/pages/ve.ui.MWParameterPage.js: Backport: [[gerrit:679462{{!}}Add filtering for the suggested values combo box (T271898)]] (duration: 00m 58s)
* 11:15 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:676930{{!}}Add default import sources (T214139)]] (duration: 00m 58s)
* 11:11 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:07 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:49 _joe_: temporary installing some python packages on deploy1002 for testing
* 10:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
* 10:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
* 10:20 moritzm: drain ganeti5001
* 10:11 hnowlan: opening access to cassandra on new AQS hosts (aqs101*) to analytics-in4 filter
* 10:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1001.eqiad.wmnet
* 10:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host aphlict1001.eqiad.wmnet
* 09:42 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
* 09:42 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
* 09:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 09:40 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 08:50 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 08:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1003.eqiad.wmnet
* 08:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
* 08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1004.eqiad.wmnet
* 08:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1004.eqiad.wmnet
* 08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
* 08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2004.codfw.wmnet
* 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
* 08:09 dcaro: reprepro updating thirdparty/ceph-octopus repo
* 08:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2004.codfw.wmnet
* 08:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2003.codfw.wmnet
* 08:05 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 08:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2003.codfw.wmnet
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1086 from dbctl [[phab:T278229|T278229]]', diff saved to https://phabricator.wikimedia.org/P15482 and previous config saved to /var/cache/conftool/dbconfig/20210420-075949-marostegui.json
* 07:38 XioNoX: BGP: prioritize directly connected peers - [[phab:T280054|T280054]]
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15480 and previous config saved to /var/cache/conftool/dbconfig/20210420-073808-root.json
* 07:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
* 07:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15479 and previous config saved to /var/cache/conftool/dbconfig/20210420-072305-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15478 and previous config saved to /var/cache/conftool/dbconfig/20210420-070801-root.json
* 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15477 and previous config saved to /var/cache/conftool/dbconfig/20210420-065257-root.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
* 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
* 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
* 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
* 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE
* 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE


== 2021-04-19 ==
== 2021-09-07 ==
* 22:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 23:25 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 23:20 robh@cumin1001: START - Cookbook sre.dns.netbox
* 22:37 Trey314159: reindexing wikidata on cloudelastic finished/failed ([[phab:T274200|T274200]])
* 23:13 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:719381{{!}}Enable UrlShortener everywhere (T267925)]] (duration: 00m 58s)
* 22:37 Trey314159: reindexing commons and wikidata on elastic@eqiad finished/failed ([[phab:T274200|T274200]])
* 23:07 dpifke@deploy1002: Synchronized wmf-config/profiler.php: Config: [[gerrit:716041{{!}}profiler: use seperate pipeline inside k8s pods (T288165)]] (duration: 00m 58s)
* 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
* 22:29 cstone: SmashPig revision changed from {{Gerrit|afd362b163}} to {{Gerrit|3607b16f83}}
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
* 20:41 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715018{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on all wikis (T251480)]] (duration: 00m 59s)
* 21:03 sbassett: Deployed security patch for [[phab:T280226|T280226]]
* 20:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:56 dcausse: repool wdqs1005
* 20:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2004.codfw.wmnet
* 17:18 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2004.codfw.wmnet
* 17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2003.codfw.wmnet
* 17:01 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:56 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/tests: Factor out rollback logic from WikiPage - /tests (duration: 00m 59s)
* 16:39 moritzm: installing jetty9 security updates on buster
* 18:55 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/maintenance: Factor out rollback logic from WikiPage - /maintenance (duration: 00m 57s)
* 16:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 18:51 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/includes/: Factor out rollback logic from WikiPage - /includes (duration: 01m 01s)
* 16:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 18:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2003.codfw.wmnet
* 16:30 dancy@deploy1002: Synchronized README: testing (duration: 00m 59s)
* 18:47 jiji@cumin1001: conftool action : set/pooled=yes; selector: cluster=thumbor,name=thumbor2001.codfw.wmnet
* 15:18 akosiaris: run_benchmarky.py against mwdebug.svc.codfw.wmnet for performance tests
* 18:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2002.codfw.wmnet
* 15:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:39 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T274436|T274436]] Math: Enable RESTBase-less Wikidata math validation (duration: 00m 56s)
* 15:04 jbond: upload python-prometheus-client_0.6.0 to stretch-wikimedia
* 18:34 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2002.codfw.wmnet
* 14:50 mutante: snapshot1015 - manually removed prometheus-puppet-agent-stats from crontab which was sending spam and is now a timer
* 18:21 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T249745|T249745]] [EventBus] Make eventage-main timeout consistent with envoy (duration: 00m 56s)
* 14:33 mutante: CI - migrating zuul-merger cronjob to systemd timer (contint*)
* 18:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/: {{Gerrit|66d137b75a7073c7162c443cc8c6ec6f3be714e0}}: Remove <header> tags around headings for compat with MobileFrontend ([[phab:T280433|T280433]]) (duration: 00m 59s)
* 14:23 XioNoX: re-pool esams-eqiad - [[phab:T288503|T288503]]
* 18:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2001.codfw.wmnet
* 14:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
* 18:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/includes/Mentorship/Store/DatabaseMentorStore.php: {{Gerrit|0233507470377f6ac45768e345cd2e359e5d0e57}}: DatabaseMentorStore: Fix deprecation warning in upsert query ([[phab:T280525|T280525]]) (duration: 00m 57s)
* 14:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
* 17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2001.codfw.wmnet
* 14:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
* 17:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1004.eqiad.wmnet
* 14:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
* 17:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1004.eqiad.wmnet
* 14:17 marostegui: No more db maintenance on eqiad [[phab:T288594|T288594]]
* 17:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1003.eqiad.wmnet
* 14:08 mutante: alert1001 - temp disabled puppet, stopped icinga-wm
* 17:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1003.eqiad.wmnet
* 14:07 mutante: temp killed icinga-wm because of flooding
* 17:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1002.eqiad.wmnet
* 14:01 Emperor: removing pc2010 from orchestrator [[phab:T289117|T289117]]
* 16:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1002.eqiad.wmnet
* 13:59 Emperor: removing pc2010 from tendril and zarcillo [[phab:T289117|T289117]]
* 16:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1001.eqiad.wmnet
* 13:57 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
* 13:57 XioNoX: drain esams-eqiad for circuit maintenance - [[phab:T288503|T288503]]
* 16:25 hoo: Updated the Wikidata property suggester with data from the 2021-04-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 13:54 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15474 and previous config saved to /var/cache/conftool/dbconfig/20210419-161134-root.json
* 13:51 jayme: uncordoned kubestage2001
* 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 90%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15473 and previous config saved to /var/cache/conftool/dbconfig/20210419-155631-root.json
* 13:50 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 80%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15472 and previous config saved to /var/cache/conftool/dbconfig/20210419-154127-root.json
* 13:49 mutante: mw2264 - scap pulled and repooled after [[phab:T290242|T290242]]
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 70%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15471 and previous config saved to /var/cache/conftool/dbconfig/20210419-152623-root.json
* 13:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2264.codfw.wmnet
* 15:24 volans: reverted debmonitor-client to 0.2.0-1 on apt.w.o for jessie-wikimedia
* 13:43 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 60%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15470 and previous config saved to /var/cache/conftool/dbconfig/20210419-151119-root.json
* 13:40 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2010.codfw.wmnet
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15469 and previous config saved to /var/cache/conftool/dbconfig/20210419-145616-root.json
* 13:25 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2010.codfw.wmnet
* 14:53 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename RelatedArticles wmg variables to wg (duration: 00m 56s)
* 13:21 Emperor: removing pc2009 from orchestrator [[phab:T289116|T289116]]
* 14:53 jbond42: update debmonitor-client - [[phab:T280484|T280484]]
* 13:21 Emperor: removing pc2009 from tendril and zarcillo [[phab:T289116|T289116]]
* 14:52 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove RelatedArticles extension function and wmg to wg mapping (duration: 00m 56s)
* 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'fix s8 weights [[phab:T288594|T288594]]', diff saved to https://phabricator.wikimedia.org/P17248 and previous config saved to /var/cache/conftool/dbconfig/20210907-130244-marostegui.json
* 14:48 reedy@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: Use namespaced PoolCounter Client (duration: 00m 57s)
* 12:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2009.codfw.wmnet
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 [[phab:T278229|T278229]]', diff saved to https://phabricator.wikimedia.org/P15468 and previous config saved to /var/cache/conftool/dbconfig/20210419-144422-marostegui.json
* 12:51 mvernon@deploy1002: Synchronized wmf-config/ProductionServices.php: Remove old decommissioned pc hosts [[phab:T284825|T284825]] (duration: 01m 02s)
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 40%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15467 and previous config saved to /var/cache/conftool/dbconfig/20210419-144112-root.json
* 12:45 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2009.codfw.wmnet
* 14:41 volans: uploaded debmonitor-client 0.2.8 to apt.w.o for jessie, stretch, buster, bullseye
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'fix s1 weights [[phab:T288594|T288594]]', diff saved to https://phabricator.wikimedia.org/P17247 and previous config saved to /var/cache/conftool/dbconfig/20210907-122747-marostegui.json
* 14:29 hnowlan: imported envoyproxy_1.16.3-1 debs to envoy-future component
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'fix s1 weights [[phab:T288594|T288594]]', diff saved to https://phabricator.wikimedia.org/P17246 and previous config saved to /var/cache/conftool/dbconfig/20210907-122708-marostegui.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 30%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15466 and previous config saved to /var/cache/conftool/dbconfig/20210419-142608-root.json
* 11:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 20%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15465 and previous config saved to /var/cache/conftool/dbconfig/20210419-141105-root.json
* 11:46 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 15%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15464 and previous config saved to /var/cache/conftool/dbconfig/20210419-135601-root.json
* 11:36 awight: EU backport complete
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15463 and previous config saved to /var/cache/conftool/dbconfig/20210419-134057-root.json
* 11:33 awight@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/CodeMirror/extension.json: Backport: [[gerrit:719170{{!}}Change line numbers default to null (T290226)]] (duration: 00m 59s)
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15462 and previous config saved to /var/cache/conftool/dbconfig/20210419-132554-root.json
* 11:28 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:717192{{!}}Set template namespace for code mirror line numbering (T290226)]] (duration: 00m 59s)
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15461 and previous config saved to /var/cache/conftool/dbconfig/20210419-131936-marostegui.json
* 10:51 Emperor: removing pc2008 from orchestrator [[phab:T289115|T289115]]
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15460 and previous config saved to /var/cache/conftool/dbconfig/20210419-131501-marostegui.json
* 10:49 Emperor: removing pc2008 from tendril and zarcillo [[phab:T289115|T289115]]
* 12:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bd076306c0ae0428ff13743f499b2a02d42b6eab}}: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD everywhere ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 10:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2008.codfw.wmnet
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15459 and previous config saved to /var/cache/conftool/dbconfig/20210419-125600-marostegui.json
* 10:35 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2008.codfw.wmnet
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15458 and previous config saved to /var/cache/conftool/dbconfig/20210419-125407-marostegui.json
* 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on 6 hosts with reason: commissioning aqs_new hosts
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1182 to dbctl [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15457 and previous config saved to /var/cache/conftool/dbconfig/20210419-125301-marostegui.json
* 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on 6 hosts with reason: commissioning aqs_new hosts
* 12:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef0f68e2a9c1c638911bb06c47ba6e8ef88ee393}}: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_NEW ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: commissioning aqs_new hosts
* 12:38 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=cswiki # [[phab:T279853|T279853]]
* 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: commissioning aqs_new hosts
* 12:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
* 10:27 Emperor: removing pc1010 from orchestrator [[phab:T289122|T289122]]
* 12:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3e3cce192f1e99cbcae739f234271411d10974ac}}: cswiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD ([[phab:T279853|T279853]]) (duration: 00m 58s)
* 10:22 Emperor: removing pc1010 from tendril and zarcillo [[phab:T289122|T289122]]
* 12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
* 10:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1010.eqiad.wmnet
* 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
* 10:02 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1010.eqiad.wmnet
* 12:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
* 09:46 Emperor: removing pc1009 from orchestrator [[phab:T289120|T289120]]
* 11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
* 09:26 Emperor: removing pc1009 from tendril and zarcillo [[phab:T289120|T289120]]
* 11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 09:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1009.eqiad.wmnet
* 11:33 moritzm: imported debdeploy 0.0.99.13-1+deb11u1 to bullseye-wikimedia [[phab:T275873|T275873]]
* 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1009.eqiad.wmnet
* 11:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki --force # [[phab:T279853|T279853]]
* 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:11 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki # [[phab:T279853|T279853]]
* 08:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|03f8ed819091624f5ae4a8d7ed3631dc322fabcd}}: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 08:51 Emperor: removing pc1008 from orchestrator [[phab:T289119|T289119]]
* 11:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:680871{{!}}Disable legacy javascript variable for the rest of wikis (T72470)]] (duration: 00m 57s)
* 08:44 Emperor: removing pc1008 from tendril and zarcillo [[phab:T289119|T289119]]
* 11:02 moritzm: import promethus-rsyslog-exporter for bullseye-wikimedia/main
* 08:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1008.eqiad.wmnet
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
* 08:31 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1008.eqiad.wmnet
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17241 and previous config saved to /var/cache/conftool/dbconfig/20210907-082952-marostegui.json
* 10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:681008{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:681008{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:34 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 08:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:24 hnowlan: imported 1.16.3 into envoy-future
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 100%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17240 and previous config saved to /var/cache/conftool/dbconfig/20210907-080230-root.json
* 10:22 moritzm: reimaging theemin to bullseye
* 07:52 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17239 and previous config saved to /var/cache/conftool/dbconfig/20210907-075235-kormat.json
* 10:15 dcausse: depooling wdqs1005
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17238 and previous config saved to /var/cache/conftool/dbconfig/20210907-074901-marostegui.json
* 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 75%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17237 and previous config saved to /var/cache/conftool/dbconfig/20210907-074726-root.json
* 10:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 07:37 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17236 and previous config saved to /var/cache/conftool/dbconfig/20210907-073731-kormat.json
* 10:05 arturo: aborrero@apt1001:~ $ sudo -i reprepro --component thirdparty/kubeadm-k8s-1-18 update buster-wikimedia
* 07:37 godog: +100G for prometheus/k8s codfw
* 10:04 arturo: aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished (remove old buster-wikimedia{{!}}thirdparty/kubeadm-k8s-1-15,16 repos and packages)
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Start to pool db2090 into API [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17235 and previous config saved to /var/cache/conftool/dbconfig/20210907-073436-marostegui.json
* 09:56 ema: cp3051: varnish-frontend-restart to apply exp policy settings changes starting from empty cache [[phab:T275809|T275809]]
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 50%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17234 and previous config saved to /var/cache/conftool/dbconfig/20210907-073222-root.json
* 09:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
* 07:22 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 50%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17233 and previous config saved to /var/cache/conftool/dbconfig/20210907-072227-kormat.json
* 09:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 25%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17232 and previous config saved to /var/cache/conftool/dbconfig/20210907-071719-root.json
* 09:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15454 and previous config saved to /var/cache/conftool/dbconfig/20210419-092251-root.json
* 07:07 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17231 and previous config saved to /var/cache/conftool/dbconfig/20210907-070724-kormat.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15453 and previous config saved to /var/cache/conftool/dbconfig/20210419-092234-marostegui.json
* 07:07 kormat@cumin1001: dbctl commit (dc=all): 'Fixing db2118's pooling config [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17230 and previous config saved to /var/cache/conftool/dbconfig/20210907-070702-kormat.json
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15452 and previous config saved to /var/cache/conftool/dbconfig/20210419-091535-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 10%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17229 and previous config saved to /var/cache/conftool/dbconfig/20210907-070215-root.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 90%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15451 and previous config saved to /var/cache/conftool/dbconfig/20210419-090747-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 5%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17228 and previous config saved to /var/cache/conftool/dbconfig/20210907-064711-root.json
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15450 and previous config saved to /var/cache/conftool/dbconfig/20210419-090031-root.json
* 05:15 marostegui: Optimize eowiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 80%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15449 and previous config saved to /var/cache/conftool/dbconfig/20210419-085243-root.json
* 05:15 marostegui: Optimize vecwiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15448 and previous config saved to /var/cache/conftool/dbconfig/20210419-084834-marostegui.json
* 05:14 marostegui: Optimize kawiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15447 and previous config saved to /var/cache/conftool/dbconfig/20210419-084528-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15446 and previous config saved to /var/cache/conftool/dbconfig/20210419-084523-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 70%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15445 and previous config saved to /var/cache/conftool/dbconfig/20210419-083740-root.json
* 08:35 ema: restart debmonitor-client.service on cp4030, dns5002, an-worker1106 [[phab:T280484|T280484]]
* 08:34 marostegui: Testing log
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15444 and previous config saved to /var/cache/conftool/dbconfig/20210419-083021-root.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15443 and previous config saved to /var/cache/conftool/dbconfig/20210419-083018-root.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15442 and previous config saved to /var/cache/conftool/dbconfig/20210419-082559-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 60%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15441 and previous config saved to /var/cache/conftool/dbconfig/20210419-082236-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15440 and previous config saved to /var/cache/conftool/dbconfig/20210419-082000-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15439 and previous config saved to /var/cache/conftool/dbconfig/20210419-081517-root.json
* 08:07 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
* 08:07 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15438 and previous config saved to /var/cache/conftool/dbconfig/20210419-080732-root.json
* 08:07 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15437 and previous config saved to /var/cache/conftool/dbconfig/20210419-080456-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15436 and previous config saved to /var/cache/conftool/dbconfig/20210419-080454-root.json
* 08:03 moritzm: installing python-bleach security updates
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15435 and previous config saved to /var/cache/conftool/dbconfig/20210419-080013-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 40%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15434 and previous config saved to /var/cache/conftool/dbconfig/20210419-075229-root.json
* 07:51 moritzm: upgrade mwdebug2002 to PHP 7.2.34
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15433 and previous config saved to /var/cache/conftool/dbconfig/20210419-074953-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15432 and previous config saved to /var/cache/conftool/dbconfig/20210419-074950-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15431 and previous config saved to /var/cache/conftool/dbconfig/20210419-074510-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15430 and previous config saved to /var/cache/conftool/dbconfig/20210419-074155-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 30%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15429 and previous config saved to /var/cache/conftool/dbconfig/20210419-073725-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15428 and previous config saved to /var/cache/conftool/dbconfig/20210419-073449-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15427 and previous config saved to /var/cache/conftool/dbconfig/20210419-073446-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15426 and previous config saved to /var/cache/conftool/dbconfig/20210419-073425-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 20%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15425 and previous config saved to /var/cache/conftool/dbconfig/20210419-072221-root.json
* 07:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:19 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15424 and previous config saved to /var/cache/conftool/dbconfig/20210419-071943-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15423 and previous config saved to /var/cache/conftool/dbconfig/20210419-071921-root.json
* 07:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15422 and previous config saved to /var/cache/conftool/dbconfig/20210419-071701-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 15%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15421 and previous config saved to /var/cache/conftool/dbconfig/20210419-070718-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15420 and previous config saved to /var/cache/conftool/dbconfig/20210419-070439-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15419 and previous config saved to /var/cache/conftool/dbconfig/20210419-070418-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15418 and previous config saved to /var/cache/conftool/dbconfig/20210419-070035-marostegui.json
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15417 and previous config saved to /var/cache/conftool/dbconfig/20210419-065627-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15416 and previous config saved to /var/cache/conftool/dbconfig/20210419-065213-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15415 and previous config saved to /var/cache/conftool/dbconfig/20210419-064914-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15414 and previous config saved to /var/cache/conftool/dbconfig/20210419-064600-marostegui.json
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15413 and previous config saved to /var/cache/conftool/dbconfig/20210419-064123-root.json
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15412 and previous config saved to /var/cache/conftool/dbconfig/20210419-062620-root.json
* 06:17 _joe_: upgrading envoy everywhere in eqiad [[phab:T280317|T280317]]
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15411 and previous config saved to /var/cache/conftool/dbconfig/20210419-061116-root.json
* 06:10 _joe_: upgrading envoy everywhere in codfw [[phab:T280317|T280317]]
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15410 and previous config saved to /var/cache/conftool/dbconfig/20210419-060321-marostegui.json
* 06:01 _joe_: rolling out further envoy upgrades [[phab:T280317|T280317]]
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 10%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15409 and previous config saved to /var/cache/conftool/dbconfig/20210419-055613-root.json
* 05:53 marostegui: Stop sanitarium master on s2 (lag will show up on clouddb* labsdb* hosts) [[phab:T272008|T272008]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15408 and previous config saved to /var/cache/conftool/dbconfig/20210419-055240-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P15407 and previous config saved to /var/cache/conftool/dbconfig/20210419-054831-marostegui.json
* 05:42 marostegui: Stop sanitarium master on s1 (lag will show up on clouddb* labsdb* hosts) [[phab:T272008|T272008]]
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15406 and previous config saved to /var/cache/conftool/dbconfig/20210419-054158-marostegui.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15405 and previous config saved to /var/cache/conftool/dbconfig/20210419-053730-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15404 and previous config saved to /var/cache/conftool/dbconfig/20210419-053127-marostegui.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1179 to dbctl [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15403 and previous config saved to /var/cache/conftool/dbconfig/20210419-053050-marostegui.json
* 05:05 marostegui: Restart m2 database master [[phab:T280251|T280251]]


== 2021-04-18 ==
== 2021-09-06 ==
* 06:40 Amir1: cleaning watchlist of User:Mr._Ibrahem in wikidatawiki (in main ns only)
* 23:52 tstarling@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/SecurePoll/includes/Talliers/STVTallier.php: [[phab:T290000|T290000]] (duration: 00m 58s)
* 16:14 Amir1: Deployed patch for [[phab:T290394|T290394]]
* 15:01 Emperor: removing pc1007 from orchestrator [[phab:T289118|T289118]]
* 15:00 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:53 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17226 and previous config saved to /var/cache/conftool/dbconfig/20210906-145341-kormat.json
* 14:50 Emperor: removing pc1007 from tendril and zarcillo [[phab:T289118|T289118]]
* 14:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1007.eqiad.wmnet
* 14:45 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1026.eqiad.wmnet
* 14:44 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1026.eqiad.wmnet
* 14:36 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 14:35 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1007.eqiad.wmnet
* 14:22 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 14:19 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715492{{!}}Set permission of creating short url to everyone everywhere (T267921 T267925)]], Part II (duration: 00m 57s)
* 14:17 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:715492{{!}}Set permission of creating short url to everyone everywhere (T267921 T267925)]], Part I (duration: 00m 59s)
* 14:12 moritzm: installing postgres 9.6 security updates
* 14:05 gehel: re-pooling wdqs1007, catched up on lag
* 13:56 jbond: update facter networking fact gerrit:715949
* 13:51 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:719118{{!}}ProductionServices: fix comment for rdb* servers]] (duration: 00m 58s)
* 13:42 moritzm: updated thirdparty/gitlab component to 14.0.10 [[phab:T284811|T284811]]
* 13:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:42 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:41 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 12:40 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 12:29 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:06 godog: silence statograph until thurs on alert1001 - [[phab:T290425|T290425]]
* 11:58 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=plwiki 'editor' 'editeditorprotected' # [[phab:T230103|T230103]]
* 11:56 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=<nowiki>{</nowiki>hewiki,lvwiki,srwiki,srwikibooks<nowiki>}</nowiki> 'autopatrol' 'editautopatrolprotected' # [[phab:T230103|T230103]]
* 11:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=etwiki 'autopatrol' 'editautopatrolprotected' # [[phab:T230103|T230103]]
* 11:50 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=dewiktionary 'autoreviewprotected' 'editautoreviewprotected' # [[phab:T230103|T230103]]
* 11:48 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=arwiki 'autoreview' 'editautoreviewprotected' # [[phab:T230103|T230103]]
* 11:07 urbanecm: EU B&C window done
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8d7cf8f7c3faaf3773940e96ba0cf599e725237}}: foundationwiki: Create editor group ([[phab:T205352|T205352]]) (duration: 00m 57s)
* 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f90862be8c7b540065da24c24f2e2ac0df5b9d07}}: Growth: Define wgGEMentorDashboardDiscoveryEnabled ([[phab:T289054|T289054]]) (duration: 00m 58s)
* 11:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/maintenance/renameRestrictions.php: {{Gerrit|18e43ecca7d25d2d93de2f98f3bf5b36f5d4b780}}: renameRestrictions.php: Update protected_titles as well ([[phab:T290398|T290398]]) (duration: 00m 59s)
* 10:39 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
* 10:38 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 10:22 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 10:17 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 09:22 gehel: depooling wdqs1007, catching up on lag
* 09:06 gehel: restart blazegraph and updater on wdqs1007
* 08:46 jbond: update networking fact - gerrit:715943
* 07:57 godog: fail sdw on ms-be1062, reported errors
* 07:51 moritzm: installing libssh security updates
* 07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:44 moritzm: installing squashfs-tools security updates
* 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 06:28 marostegui: Optimize table mkwiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 06:26 marostegui: Optimize table bewiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 06:23 marostegui: Optimize table dewiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
* 05:07 marostegui: Stop replication on db2090 (old s4 master) [[phab:T289650|T289650]] [[phab:T288803|T288803]]
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 (current master) from API [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17223 and previous config saved to /var/cache/conftool/dbconfig/20210906-050502-marostegui.json
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2090 [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17222 and previous config saved to /var/cache/conftool/dbconfig/20210906-050419-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary and set section read-write [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17221 and previous config saved to /var/cache/conftool/dbconfig/20210906-050140-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17220 and previous config saved to /var/cache/conftool/dbconfig/20210906-050048-root.json
* 05:00 marostegui: Starting s4 codfw failover from db2090 to db2110 - [[phab:T289650|T289650]]
* 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17219 and previous config saved to /var/cache/conftool/dbconfig/20210906-040740-root.json
* 04:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 33 hosts with reason: Primary switchover s4 [[phab:T289650|T289650]]
* 04:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 33 hosts with reason: Primary switchover s4 [[phab:T289650|T289650]]


== 2021-04-17 ==
== 2021-09-05 ==
* 16:16 Amir1: cleaning SuccuBot's watchlist in wikidatawiki
* 18:54 urbanecm: wikiadmin@10.192.0.119(ptwiki)> update protected_titles set pt_create_perm='editautoreviewprotected' where pt_create_perm='autoreviewer'; # [[phab:T290396|T290396]]
* 00:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
* 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
* 00:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1402.eqiad.wmnet
* 00:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1403.eqiad.wmnet
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1403.eqiad.wmnet
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1402.eqiad.wmnet
* 00:14 ryankemper: [[phab:T267927|T267927]] `sudo run-puppet-agent` and `sudo pool` on `wdqs2003`
* 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 00:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 00:08 ryankemper: [[phab:T267927|T267927]] Reload of `wdqs2003` complete
* 00:07 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE


== 2021-04-16 ==
== 2021-09-04 ==
* 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwdebug1003.eqiad.wmnet
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17217 and previous config saved to /var/cache/conftool/dbconfig/20210904-133532-root.json
* 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17216 and previous config saved to /var/cache/conftool/dbconfig/20210904-132029-root.json
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 50%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17215 and previous config saved to /var/cache/conftool/dbconfig/20210904-130525-root.json
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17214 and previous config saved to /var/cache/conftool/dbconfig/20210904-125021-root.json
* 23:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mwdebug1003.eqiad.wmnet
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17213 and previous config saved to /var/cache/conftool/dbconfig/20210904-123518-root.json
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwdebug1003.eqiad.wmnet
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 5%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17212 and previous config saved to /var/cache/conftool/dbconfig/20210904-122014-root.json
* 23:47 mutante: decom'ing mwdebug1003, stretch VM created in [[phab:T267248|T267248]]
* 09:04 elukey: restart wmf_auto_restart_rsyslog.service on puppetdb1002
* 23:39 mutante: reimaging last 3 remaining stretch appservers with buster, mw1307, mw1402, mw1403
* 09:00 elukey: `systemctl reset-failed ifup@ens6.service` on puppetdb2002 - [[phab:T273026|T273026]]
* 23:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
* 03:02 rzl@cumin2001: dbctl commit (dc=all): 'Depool db2137:3314', diff saved to https://phabricator.wikimedia.org/P17210 and previous config saved to /var/cache/conftool/dbconfig/20210904-030231-rzl.json
* 23:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
* 21:08 ejegg: updated fundraising python tools from {{Gerrit|ef54260b0d}} to {{Gerrit|3d950fffbd}}
* 20:40 Trey314159: reindexing wikidata on cloudelastic... AGAIN ([[phab:T274200|T274200]])
* 17:48 ryankemper: [[phab:T267927|T267927]] Transferring from `wdqs2008`->`wdqs2003` to resolve the data corruption on `wdqs2003`
* 17:47 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
* 17:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
* 17:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
* 17:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
* 17:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
* 17:35 mutante: depooling mwdebug1003 (stretch VM, will be removed), mwdebug1001/1002 (buster) and unchanged
* 17:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1003.eqiad.wmnet
* 17:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
* 17:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
* 17:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
* 17:03 ryankemper: [[phab:T267927|T267927]] Pooled `wdqs1007`, `wdqs2003`, `wdqs1008`, `wdqs2004`
* 17:00 ryankemper: [[phab:T267927|T267927]] Following data transfers complete: `wdqs1004`->`wdqs1007`, `wdqs2001`->`wdqs2003`, `wdqs1003`->`wdqs1008`, `wdqs2008`->`wdqs2004`
* 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:59 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:09 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:57 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 15:22 urbanecm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
* 14:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
* 14:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
* 14:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
* 14:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
* 14:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
* 14:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
* 14:43 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
* 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
* 14:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
* 14:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2020.codfw.wmnet
* 14:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
* 13:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2019.codfw.wmnet
* 12:59 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
* 12:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
* 12:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
* 12:47 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 12:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
* 12:41 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
* 12:37 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 12:25 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 12:22 jayme: updated envoyproxy to 1.15.4-1 on 'A:mw-canary or A:restbase-canary'
* 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
* 11:02 moritzm: imported ferm 2.5.1-1+wmf1 to bullseye-wikimedia/main [[phab:T275873|T275873]]
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
* 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
* 10:49 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
* 10:44 arturo: merging homer change to cr-eqiad ([[phab:T279342|T279342]])
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
* 10:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
* 10:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
* 10:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
* 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
* 10:08 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
* 10:08 jayme: updated envoyproxy to 1.15.4-1 on mw1325.eqiad.wmnet,restbase1026.eqiad.wmnet
* 10:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 10:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2011.codfw.wmnet
* 10:03 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 10:00 jayme: updated envoyproxy to 1.15.4-1 on mwdebug1001.eqiad.wmnet
* 09:57 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2011.codfw.wmnet
* 09:55 jayme: imported envoyproxy_1.15.4-1 to stretch-wikimedia - [[phab:T280317|T280317]]
* 09:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2010.codfw.wmnet
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15384 and previous config saved to /var/cache/conftool/dbconfig/20210416-093446-root.json
* 09:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2010.codfw.wmnet
* 09:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2009.codfw.wmnet
* 09:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2009.codfw.wmnet
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15383 and previous config saved to /var/cache/conftool/dbconfig/20210416-091942-root.json
* 09:13 jayme: imported envoyproxy_1.15.4-1 to buster-wikimedia - [[phab:T280317|T280317]]
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15380 and previous config saved to /var/cache/conftool/dbconfig/20210416-090438-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15374 and previous config saved to /var/cache/conftool/dbconfig/20210416-084935-root.json
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15373 and previous config saved to /var/cache/conftool/dbconfig/20210416-083431-root.json
* 07:53 elukey: run reprepro --delete clearvanished on apt1001 to clear all cloudera packages
* 07:41 ema: cp-upload_ulsfo: rolling varnish-frontend-restart to apply exp policy settings changes starting from empty caches [[phab:T275809|T275809]]
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P15372 and previous config saved to /var/cache/conftool/dbconfig/20210416-071936-marostegui.json
* 06:58 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
* 06:52 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
* 06:48 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
* 06:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
* 06:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
* 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2095.codfw.wmnet with reason: REIMAGE
* 06:20 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
* 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2095.codfw.wmnet with reason: REIMAGE
* 05:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics-tool1001.eqiad.wmnet
* 05:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2094.codfw.wmnet with reason: REIMAGE
* 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2094.codfw.wmnet with reason: REIMAGE
* 05:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics-tool1001.eqiad.wmnet
* 03:31 ryankemper: [wdqs] `ryankemper@wdqs1013:~$ sudo systemctl restart wdqs-blazegraph`
* 03:26 ryankemper: [[phab:T267927|T267927]] Pooled `wdqs2001`
* 03:22 ryankemper: [[phab:T267927|T267927]] Pooled `wdqs1006` and `wdqs2002`
* 03:09 ryankemper: [[phab:T267927|T267927]] kicked off next round of `data-transfer`s: `wdqs1004`->`wdqs1007`, `wdqs2001`->`wdqs2003`, `wdqs1003`->`wdqs1008`, `wdqs2008`->`wdqs2004`
* 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 03:05 ryankemper: [[phab:T267927|T267927]] Last round of `data-transfer`s finished successfully, proceeding to next round
* 03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 00:30 Krinkle: Delete old data at doc1001:/srv/doc/cover/PasswordBlacklist (ref [[phab:T254799|T254799]])
* 00:09 jforrester@deploy1002: Finished deploy [integration/docroot@63b6fb6]: Sync with CI updates (no-op) (duration: 00m 08s)
* 00:09 jforrester@deploy1002: Started deploy [integration/docroot@63b6fb6]: Sync with CI updates (no-op)


== 2021-04-15 ==
== 2021-09-03 ==
* 23:37 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/skins/Vector/skin.json: Backport: [[gerrit:679842{{!}}Adjust floating override (T280260)]] (duration: 00m 56s)
* 21:49 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:35 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/skins/Vector/resources/skins.vector.styles.legacy/layouts/screen.less: Backport: [[gerrit:679842{{!}}Adjust floating override (T280260)]] (duration: 00m 56s)
* 20:30 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:31 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: [[gerrit:679845{{!}}searchSatisfaction: Default userEditBucket back to 0 edits (T280294)]] (duration: 00m 57s)
* 19:33 krinkle@deploy1002: Finished deploy [integration/docroot@6492b3d]: {{Gerrit|I48480e89e5f6}} (duration: 00m 10s)
* 23:17 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:679947{{!}}Create Draft namespace on itwiki (T280289)]] (duration: 00m 56s)
* 19:33 krinkle@deploy1002: Started deploy [integration/docroot@6492b3d]: {{Gerrit|I48480e89e5f6}}
* 23:09 jforrester@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:678342{{!}}[wikitech] Update logo to mirror the new MediaWiki logo (T279087)]] (duration: 00m 56s)
* 19:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:08 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: [[gerrit:678342{{!}}[wikitech] Update logo to mirror the new MediaWiki logo (T279087)]] (duration: 00m 56s)
* 19:04 ryankemper: [[phab:T290330|T290330]] `ryankemper@cumin1001:~$ sudo -E cumin 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki>' 'sudo rm -fv /etc/cron.hourly/restart-blazegraph'` (Cleaned up manually created crons now that we have [somewhat hacky] systemd timers doing the same job)
* 23:07 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: [[gerrit:678342{{!}}[wikitech] Update logo to mirror the new MediaWiki logo (T279087)]] (duration: 00m 57s)
* 17:42 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 23:06 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: [[gerrit:678342{{!}}[wikitech] Update logo to mirror the new MediaWiki logo (T279087)]] (duration: 00m 57s)
* 17:40 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:56 ryankemper: [[phab:T267927|T267927]] WDQS kicked off next round of `data-transfer`s: `wdqs1004`->`wdqs1006`, `wdqs2001`->`wdqs2002`, `wdqs2008`->`wdqs1003`
* 17:35 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:17 ryankemper: [[phab:T290330|T290330]] Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 across `wdqs` fleet; codfw wdqs hosts will restart on average once per hour now to address ongoing availability issues for wdqs codfw
* 22:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 16:32 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 16:10 gehel: blazegraph (public cofdfw cluster) will now restart every hour - [[phab:T290330|T290330]]
* 22:48 ryankemper: [[phab:T267927|T267927]] pooled `wdqs1005` (all caught up on lag)
* 15:53 jbond: enable puppet fleet wide to post puppetdb database maintance - [[phab:T263578|T263578]]
* 22:46 ryankemper: [[phab:T280108|T280108]] [[phab:T267927|T267927]] Manually re-enabled and ran puppet on `wdqs1005` (had closed the tmux pane which terminated the cookbook without letting it do its final cleanup)
* 15:21 jbond: create lvm snapshot puppetdb2002_data_snapshot on ganeti2023 - [[phab:T263578|T263578]]
* 22:33 ryankemper: [[phab:T280108|T280108]] [[phab:T267927|T267927]] Data transfers completed successfully; small issue with new `wait_for_updater` logic is preventing termination so I ctrl+c'd manually
* 15:17 jbond: create lvm snapshot puppetdb1002_data_snapshot on ganeti1012 - [[phab:T263578|T263578]]
* 22:32 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 15:00 jbond: disable puppet fleet wide to preform puppetdb database maintance - [[phab:T263578|T263578]]
* 20:03 herron: migrating kafka-logging broker logstash1012 to kafka-logging1003 [[phab:T279342|T279342]]
* 14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:56 Trey314159: reindexing wikidata on cloudelastic finished/failed ([[phab:T274200|T274200]])
* 14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:43 Trey314159: reindexing wikidata on cloudelastic ([[phab:T274200|T274200]])
* 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:42 Trey314159: reindexing commons and wikidata on elastic@eqiad ([[phab:T274200|T274200]])
* 14:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:20 mutante: mw2264 - scap pull
* 19:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.1  refs [[phab:T278345|T278345]]
* 14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:49 andrew@deploy1002: Finished deploy [horizon/deploy@ec37c43]: test deploy of trove dashboard to codfw1dev (duration: 01m 58s)
* 13:11 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 18:47 andrew@deploy1002: Started deploy [horizon/deploy@ec37c43]: test deploy of trove dashboard to codfw1dev
* 13:10 dcausse: installing openjdk-8-dbg on wdqs2007
* 18:39 jdrewniak@deploy1002: Synchronized private/readme.php: Config: [[gerrit:679614{{!}}Add $wgWMEVectorPrefDiffSalt to private/readme (T261842)]] (duration: 01m 08s)
* 13:04 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 18:32 jdrewniak@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:679613{{!}}Add mediawiki.pref_diff stream to wgEventLoggingStreamNames/wgEventStreams (T261842)]] (duration: 01m 18s)
* 13:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1023.eqiad.wmnet
* 17:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1023.eqiad.wmnet
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1035-1036].eqiad.wmnet
* 16:42 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:32 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1035-1036].eqiad.wmnet
* 16:34 crusnov@cumin1001: START - Cookbook sre.dns.netbox
* 12:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1028-1032].eqiad.wmnet
* 16:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
* 12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d] (duration: 00m 06s)
* 16:21 ryankemper: [[phab:T280108|T280108]] [[phab:T267927|T267927]] Current wdqs transfers in progress: `wqds1004`->`wdqs1005`, `wdqs2008`->`wdqs2001`
* 12:03 joal@deploy1002: Started deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d]
* 16:21 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
* 12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d] (duration: 19m 16s)
* 16:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 11:56 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 19m 21s)
* 16:17 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
* 11:44 joal@deploy1002: Started deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d]
* 16:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 11:42 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from enwiki - [[phab:T289050|T289050]]
* 16:17 ryankemper: [[phab:T280108|T280108]] [[phab:T267927|T267927]] Merged https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/679702 and ran puppet-agent on `cumin2001` before next round of wdqs `data-transfer`s
* 11:37 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
* 16:12 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
* 11:36 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 01m 07s)
* 16:08 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
* 11:35 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
* 16:02 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
* 10:58 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1028-1032].eqiad.wmnet
* 15:26 otto@deploy1002: Finished deploy [analytics/refinery@497f6a5] (hadoop-test): (no justification provided) (duration: 04m 44s)
* 10:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc[1025-1026].eqiad.wmnet
* 15:21 otto@deploy1002: Started deploy [analytics/refinery@497f6a5] (hadoop-test): (no justification provided)
* 10:47 joal@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures (duration: 00m 32s)
* 15:09 elukey@deploy1002: Finished deploy [analytics/refinery@497f6a5]: Regular analytics weekly train (duration: 13m 12s)
* 10:46 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures
* 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1002.wikimedia.org
* 10:45 joal@deploy1002: deploy aborted: Deploy latest code on AQS new servers - test after failures (duration: 00m 05s)
* 15:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns1002.wikimedia.org
* 10:45 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-test): Deploy latest code on AQS new servers - test after failures
* 14:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1001.wikimedia.org
* 10:29 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 03s)
* 14:56 elukey@deploy1002: Started deploy [analytics/refinery@497f6a5]: Regular analytics weekly train
* 10:29 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 14:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns1001.wikimedia.org
* 10:22 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 55s)
* 14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5002.wikimedia.org
* 10:21 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 14:47 jayme: imported etcd-mirror_0.0.5-1 to buster-wikimedia
* 10:17 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
* 14:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns5002.wikimedia.org
* 10:16 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5001.wikimedia.org
* 10:08 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 45s)
* 14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1048.eqiad.wmnet with reason: REIMAGE
* 10:08 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 14:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1047.eqiad.wmnet with reason: REIMAGE
* 10:05 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
* 14:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1048.eqiad.wmnet with reason: REIMAGE
* 10:04 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 14:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns5001.wikimedia.org
* 10:02 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 25s)
* 14:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1046.eqiad.wmnet with reason: REIMAGE
* 10:01 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 14:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1047.eqiad.wmnet with reason: REIMAGE
* 10:00 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 53s)
* 14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2002.wikimedia.org
* 09:58 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 14:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1046.eqiad.wmnet with reason: REIMAGE
* 09:57 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 09s)
* 14:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
* 09:57 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2001.wikimedia.org
* 09:32 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979] (duration: 00m 07s)
* 14:19 ppchelko@deploy1002: Finished deploy [restbase/deploy@4755f50]: [[phab:T271983|T271983]], try again (duration: 07m 45s)
* 09:32 joal@deploy1002: Started deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979]
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns2001.wikimedia.org
* 09:26 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979] (duration: 17m 36s)
* 14:17 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
* 09:25 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1025-1026].eqiad.wmnet
* 14:12 ppchelko@deploy1002: Started deploy [restbase/deploy@4755f50]: [[phab:T271983|T271983]], try again
* 09:15 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 14:11 ppchelko@deploy1002: Finished deploy [restbase/deploy@4755f50]: [[phab:T271983|T271983]] (duration: 11m 15s)
* 09:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1022.eqiad.wmnet
* 14:09 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
* 09:13 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:00 ppchelko@deploy1002: Started deploy [restbase/deploy@4755f50]: [[phab:T271983|T271983]]
* 09:09 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:56 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=wtp104[5-7].eqiad.wmnet
* 09:09 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
* 09:09 joal@deploy1002: Started deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979]
* 13:54 andrewbogott: upgrading packages and mediawiki on wikitech-static
* 09:08 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4002.wikimedia.org
* 09:06 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
* 09:03 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 13:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4002.wikimedia.org
* 09:03 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 13:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
*