You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@f7e4f78]: Upgrade to 1.24.0 (duration: 00m 11s))
imported>Stashbot
(andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye)
 
(762 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-06-05 ==
== 2022-09-25 ==
* 16:45 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@f7e4f78]: Upgrade to 1.24.0 (duration: 00m 11s)
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 16:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@f7e4f78]: Upgrade to 1.24.0
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 16:29 bd808: Testing stashbot following hard restart of service. It was having LDAP connection failure problems.
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 16:00 AndyRussG: Turned off Fundraising job recurring_smashpig_charge
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 15:54 cdanis: enabling & rerunning puppet on netflow* [[phab:T254574|T254574]]
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 15:39 cdanis: disabling puppet on netflow* and trying {{Gerrit|I6598d8f8}} on netflow3001 first [[phab:T254574|T254574]]
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 15:39 cdanis: disabling puppet on netflow* and trying {{Gerrit|I6598d8f8}} on netflow3001 first
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 13:33 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 13:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 13:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 13:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 13:18 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 13:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 12:55 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Hotfix for be-tarask interwiki link being broken ([[phab:T111853|T111853]]) (duration: 01m 00s)
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 12:41 mutante: rebooting gerrit1002 to add more vCPUs, after [ganeti1009:~] $ sudo gnt-instance modify -B vcpus=8 gerrit1002.wikimedia.org [[phab:T239151|T239151]]
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 12:20 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 12:17 akosiaris: update blubberoid changeprop changeprop-jobqueue citoid cxserver wikifeeds zotero in staging to latest charts
* 12:17 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 12:17 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:17 akosiaris: fix typo in ganeti2016 /etc/network/interfaces and reboot
* 11:28 akosiaris: master-failover from ganeti2001 to ganeti2019 for ganeti01.svc.codfw.wmnet
* 11:25 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:25 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:25 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:14 mutante: running puppet on all ganeti nodes
* 11:05 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
* 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:49 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 09:46 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:25 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 09:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:44 akosiaris: reimage ganeti2016 for stretch
* 08:42 akosiaris: migrate mx2001.wikimedia.org to new ganeti nodes
* 08:40 akosiaris: migrate acrab to new ganeti nodes
* 08:38 akosiaris: failover master IP from ganeti1003 to ganeti1009
* 08:37 akosiaris: empty ganeti100<nowiki>{</nowiki>1,2,3,4<nowiki>}</nowiki>. Move all VMs to new ganeti nodes
* 08:28 akosiaris: migrate seaborgium.wikimedia.org to new ganeti nodes
* 08:27 akosiaris: migrate etherpad1002 to new ganeti nodes
* 08:11 marostegui: Upgrade db2075 to 10.1.45
* 07:52 vgutierrez: rolling restart of ats-tls - [[phab:T249335|T249335]]
* 07:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 06:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:17 elukey@cumin1001: START - Cookbook sre.hosts.downtime


== 2020-06-04 ==
== 2022-09-23 ==
* 23:45 catrope@deploy1001: Synchronized wmf-config/mc.php: Set coalesceKeys=non-global for WANCache on enwiki (duration: 00m 59s)
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 23:29 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Minerva site notices on Wikivoyage wiis ([[phab:T254391|T254391]]) (duration: 00m 58s)
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 23:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set guwiki timezone to Asia/Kolkata ([[phab:T253827|T253827]]) (duration: 00m 57s)
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 23:17 catrope@deploy1001: Synchronized static/images/: Change logo for zhwiki ([[phab:T254467|T254467]]) (duration: 01m 00s)
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 22:56 ryankemper: re-enabled puppet on `cloudelastic1006`. All `cloudelastic` instances now have puppet enabled and are in sync
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 20:56 ryankemper: enabled puppet on `cloudelastic1005` in order to kick off a puppet run and verify that this new node joins the ES cluster properly
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:39 ryankemper: disabled puppet on `cloudelastic100[5,6]` which are two racked nodes that we are now bringing into service. Will re-enable after successful puppet-merge / elasticsearch cluster join
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:38 ryankemper: disabled puppet on `cloudelastic100[5,6]` which are two racked nodes that we are now bringing into service. Will re-enable after successful puppet-merge / elasticsearch cluster join
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.35
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:12 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1004.eqiad.wmnet
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 15:10 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 15:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 15:07 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 15:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 14:37 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 14:36 moritzm: installing libexif security updates on jessie
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 14:08 moritzm: installing clamav security updates on mendelevium (ticket.wikimedia.org)
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 14:00 qchris: Stopping puppet on gerrit1002 (gerrit-test) to run tests for Gerrit upgrade
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 13:41 moritzm: bounced ferm on ms-be1023
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 13:35 moritzm: installing exim security updates on jessie (stretch/buster already done)
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 12:54 urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: c06e720: Revert "wgNamespaceRobotPolicies: thwiki: Add 100 NS to noindex" (T253574) (duration: 01m 06s)
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 12:18 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:14 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 12:02 moritzm: upgrading mw1276 to PHP 7.2.31
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11396 and previous config saved to /var/cache/conftool/dbconfig/20200604-115933-marostegui.json
* 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ec07467}}: wgNamespaceRobotPolicies: thwiki: Add 100 NS to noindex ([[phab:T253574|T253574]]) (duration: 01m 15s)
* 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|338cb90}}: {{Gerrit|1ade16f}}: Change $wgNamespaceRobotPolicies on Thai wikis ([[phab:T253578|T253578]]; [[phab:T253577|T253577]]; [[phab:T253576|T253576]]; [[phab:T253575|T253575]]; [[phab:T253574|T253574]]) (duration: 01m 07s)
* 11:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11395 and previous config saved to /var/cache/conftool/dbconfig/20200604-114149-marostegui.json
* 11:29 marostegui: Compress InnoDB on db1091 before pooling it as new slave on s1 - [[phab:T254462|T254462]]
* 11:21 hashar@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [metawiki] Add `centralauth-rename` to WMF OIT staff - [[phab:T254372|T254372]] (duration: 01m 08s)
* 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 11:04 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:59 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:53 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:53 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1004.eqiad.wmnet
* 10:46 marostegui: Deploy schema change on s3 (only testwiki) eqiad - [[phab:T238966|T238966]]
* 10:42 marostegui: Deploy schema change on s3 (only testwiki) codfw - [[phab:T238966|T238966]]
* 10:41 jbond42: deployed new version of puppet-merge revert is https://gerrit.wikimedia.org/r/c/operations/puppet/+/602329
* 09:57 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:54 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:51 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:51 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:50 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:46 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:42 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 09:42 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 09:41 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 09:41 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 09:26 moritzm: rolling restart of cassandra on maps* to pick up Java security updates
* 09:09 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:04 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:03 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:03 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:03 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:03 moritzm: deploying Java security updates on elastic search nodes
* 09:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:59 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:59 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:59 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:58 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:50 marostegui: Repool labsdb1009 after running maintain-views [[phab:T252219|T252219]]
* 08:42 moritzm: restarting archiva to pick up Java security updates
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to clone db1091 on s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11392 and previous config saved to /var/cache/conftool/dbconfig/20200604-081545-marostegui.json
* 08:14 marostegui: Run sudo /usr/local/sbin/maintain-views --all-databases --replace-all on labsdb1009 - [[phab:T252219|T252219]]
* 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:45 marostegui: Depool labsdb1009 - [[phab:T252219|T252219]]
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:33 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=labweb,service=labweb-ssl
* 07:32 oblivian@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=cloudceph,service=cloudceph
* 06:52 mutante: mwmaint1002 started mediawiki_job_cirrus_build_completion_indices_eqiad.service
* 06:06 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: name=logstash200.*
* 06:05 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: name=logstash100.*
* 06:04 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=eventschemas,service=eventschemas
* 06:02 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch.*
* 06:01 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch
* 06:00 _joe_: fixing weights of cp2040 [[phab:T245594|T245594]]
* 05:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 00:36 reedy@deploy1001: Synchronized php-1.35.0-wmf.35/includes/specials/SpecialUserrights.php: [[phab:T254417|T254417]] [[phab:T251534|T251534]] (duration: 01m 06s)


== 2020-06-03 ==
== 2022-09-22 ==
* 23:08 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: [[phab:T249834|T249834]] (duration: 01m 06s)
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 23:06 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [[phab:T249834|T249834]] (duration: 01m 06s)
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 22:22 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:54 jforrester@deploy1001: rebuilt and synchronized wikiversions files: Re-rolling group1 to 1.35.0-wmf.35 for [[phab:T253023|T253023]]
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:49 jforrester@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/EventStreamConfig/includes/ApiStreamConfigs.php: [[phab:T254390|T254390]] ApiStreamConfigs: If the 'constraints' parameter is unset, don't explode (duration: 01m 06s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:43 cstone: civicrm revision changed from {{Gerrit|63508b01b9}} to {{Gerrit|11b0e7c7e5}}
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:16 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 21:23 dancy@deploy1002: backport aborted:  (duration: 00m 05s)
* 21:15 ryankemper: The previously ran `_cluster/reroute?retry_failed=true` command worked as intended, the two shards in question have recovered and we're back to green cluster status. We're now in a known state and ready to proceed with the eqiad rolling upgrade
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:13 ryankemper: Ran `curl -X POST "https://localhost:9243/_cluster/reroute?pretty&retry_failed=true&explain=true" -H 'Content-Type: application/json' -d '<nowiki>{</nowiki><nowiki>}</nowiki>' --insecure` via the ssh tunnel `ssh bast4002.wikimedia.org -L 9243:search.svc.eqiad.wmnet:9243 -L 9443:search.svc.eqiad.wmnet:9443 -L 9643:search.svc.eqiad.wmnet:9643`, two unassigned shards are now initializing
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:05 ryankemper: Elasticsearch Eqiad was in yellow cluster status before starting the above cookbook run (therefore the run was a no-op until I ctlr+C'd), going to try unsticking the two unassigned shards via `/_cluster/reroute?retry_failed=true`
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:03 ryankemper@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
* 20:55 brennen: end of utc late backport & config window
* 20:58 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 20:49 eileen: civicrm revision changed from {{Gerrit|eb156dffa4}} to {{Gerrit|63508b01b9}}, config revision is {{Gerrit|95dcdb0a8a}}
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 20:47 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 20:19 gehel: elasticsearch cluster restart stopped
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:18 ryankemper@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 19:35 ppchelko@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 20:36 brennen@deploy1002: backport aborted:  (duration: 02m 16s)
* 19:35 ppchelko@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:33 ppchelko@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:32 ppchelko@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:30 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:29 ppchelko@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:29 ppchelko@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:20 jforrester@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to wmf.34 [[phab:T253023|T253023]]
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:16 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:15 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 19:14 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.35 (duration: 01m 05s)
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 19:13 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.35
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 19:05 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: [[phab:T32405|T32405]] Stop special casing the main page on another 47 projects (duration: 01m 08s)
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:01 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 601843 Enable talk pages on Swedish Minerva (duration: 01m 08s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:59 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:56 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:55 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 601842 - Disable growth survey (duration: 01m 06s)
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:49 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: gerrit 596277 Use AddFooterLink hook for code of conduct and contact links (duration: 01m 05s)
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 18:34 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 599150 - enable kafka purges for group0 (duration: 01m 06s)
* 18:38 dancy@deploy1002: Started scap: testing
* 18:19 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: gerrit 570396 - enable kask-session everywhere. CS.php (duration: 01m 05s)
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 18:14 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 570396 - enable kask-session everywhere. IS.php (duration: 01m 06s)
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 17:15 ejegg: updated payments-wiki from {{Gerrit|e46114d8b1}} to {{Gerrit|c1d14a5db7}}
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 17:08 elukey: ganeti: gnd-instance reboot an-launcher1001 to get new memory settings - [[phab:T254125|T254125]]
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 15:21 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 15:19 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 15:12 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 14:50 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 in pc3 after reimaging [[phab:T252182|T252182]] (duration: 01m 06s)
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 14:47 moritzm: updated grafana on cloudmetrics* to 6.7.4
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:26 kormat: stopping replication on pc1010
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:16 gehel: cleaning commonsrdf-dumps cron entry manually on snapshot1008
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:00 hashar: Restarted CI Jenkins for plugin update
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Replace pc1009 with pc1010 reimaging [[phab:T252182|T252182]] (duration: 01m 06s)
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:47 kormat: reimaging *pc1009 (promise) to buster [[phab:T252182|T252182]]
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:44 kormat: reimaging pc1007 to buster, wish me luck [[phab:T252182|T252182]]
* 16:39 dancy@deploy1002: Sync cancelled.
* 13:20 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:13 kormat@deploy1001: Synchronized wmf-config/db-codfw.php: Put pc2009 back into pc3 after reimaging [[phab:T252182|T252182]] (duration: 01m 05s)
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 13:01 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1080', diff saved to https://phabricator.wikimedia.org/P11385 and previous config saved to /var/cache/conftool/dbconfig/20200603-120136-marostegui.json
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:57 moritzm: updating linux-libc-dev on stretch and buster hosts
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:56 XioNoX: configure management-instance on cr1/2-eqiad - [[phab:T247073|T247073]]
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:51 XioNoX: configure management-instance on cr2-codfw - [[phab:T247073|T247073]]
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2124 after MCR schema change', diff saved to https://phabricator.wikimedia.org/P11384 and previous config saved to /var/cache/conftool/dbconfig/20200603-114409-marostegui.json
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P11383 and previous config saved to /var/cache/conftool/dbconfig/20200603-114351-marostegui.json
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:31 Lucas_WMDE: EU SWAT done
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:30 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php eswiki --fix {{!}} tee [[phab:T254077|T254077]].fix
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 11:29 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php eswiki {{!}} tee [[phab:T254077|T254077]].dry-run
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:27 moritzm: installing rubygems-integration updates for Buster
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:600979{{!}}[eswiki] Normalize talk namespaces for Anexo, Portal and Wikiproyecto (T254077)]] (duration: 01m 03s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:25 moritzm: install brltty updates on Buster
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 11:23 XioNoX: configure management-instance on cr1-codfw - [[phab:T247073|T247073]]
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:19 XioNoX: configure management-instance on cr1-eqsin - [[phab:T247073|T247073]]
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 11:15 moritzm: installing python-oslo.utils security updates
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 11:12 XioNoX: remove unused logical-systems from all MX204 routers - [[phab:T247073|T247073]]
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P11382 and previous config saved to /var/cache/conftool/dbconfig/20200603-111055-marostegui.json
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 11:08 marostegui: Add rev_id to revision table on db2124 - [[phab:T238966|T238966]]
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 11:05 moritzm: installing pango updates for buster
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 - will be reimaged and moved to s1 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11381 and previous config saved to /var/cache/conftool/dbconfig/20200603-104251-marostegui.json
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P11380 and previous config saved to /var/cache/conftool/dbconfig/20200603-101426-marostegui.json
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P11378 and previous config saved to /var/cache/conftool/dbconfig/20200603-093810-marostegui.json
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:10 moritzm: upgrading mw1262-1265 to PHP 7.2.31
* 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:04 marostegui: Reimage db1080
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for reimage', diff saved to https://phabricator.wikimedia.org/P11376 and previous config saved to /var/cache/conftool/dbconfig/20200603-090143-marostegui.json
* 08:42 kormat@deploy1001: Synchronized wmf-config/db-codfw.php: Replace pc2009 with pc2010 while reimaging (duration: 01m 16s)
* 08:19 moritzm: upgrading mw1261 to PHP 7.2.31
* 08:17 XioNoX: re-add ae2 physical interfaces to external group - [[phab:T253970|T253970]]
* 08:09 moritzm: upgrading remaining mwdebug* servers to PHP 7.2.31
* 08:08 kormat: reimaging pc2009 to buster [[phab:T252182|T252182]]
* 08:08 XioNoX: remove ae2 physical interfaces from external group - [[phab:T253970|T253970]]
* 07:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 07:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:45 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 07:44 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw218[0-6].codfw.wmnet
* 07:36 mutante: depooling mw2180 - mw2186
* 07:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw218[0-6].codfw.wmnet
* 07:35 moritzm: imported PHP 7.2.31 to apt.wikimedia.org/component/php72
* 07:33 ema: cp: upgrade purged to 0.15
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2071 after cloning it from db2130 to restore all the schema changes applied', diff saved to https://phabricator.wikimedia.org/P11375 and previous config saved to /var/cache/conftool/dbconfig/20200603-072841-marostegui.json
* 07:15 XioNoX: repool esams - [[phab:T254021|T254021]]
* 07:09 XioNoX: re-activate peering/transit BGP on cr2-esams - [[phab:T254021|T254021]]
* 07:00 XioNoX: re0.cr2-esams> request system reboot both-routing-engines - [[phab:T254021|T254021]]
* 06:56 XioNoX: deactivate peering/transit BGP cr2-esams - [[phab:T244497|T244497]]
* 06:54 XioNoX: failover vrrp to cr3-esams - [[phab:T244497|T244497]]
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1138 [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11374 and previous config saved to /var/cache/conftool/dbconfig/20200603-063752-marostegui.json
* 06:18 XioNoX: cr3-esams> request chassis routing-engine master switch - [[phab:T244497|T244497]]
* 06:11 XioNoX: cr3-esams> request vmhost reboot re1 (backup re) - [[phab:T244497|T244497]]
* 06:08 XioNoX: re-activate transit BGP to cr3-knams - [[phab:T254021|T254021]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1138 [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11373 and previous config saved to /var/cache/conftool/dbconfig/20200603-060124-marostegui.json
* 05:58 XioNoX: reboot cr3-knams - [[phab:T254021|T254021]]
* 05:51 XioNoX: deactivate transit BGP ton cr3-knams - [[phab:T254021|T254021]]
* 05:48 XioNoX: depool esams - [[phab:T254021|T254021]]
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130 to clone db2071', diff saved to https://phabricator.wikimedia.org/P11371 and previous config saved to /var/cache/conftool/dbconfig/20200603-054117-marostegui.json
* 05:40 marostegui: Stop MySQL on db2130 to clone db2071
* 05:38 XioNoX: deactivate graceful-switchover on cr3-esams - [[phab:T254021|T254021]]
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1138 [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11370 and previous config saved to /var/cache/conftool/dbconfig/20200603-053748-marostegui.json
* 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:14 XioNoX: turn cr1-codfw:fpc0 online - [[phab:T254110|T254110]]
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1138 [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11369 and previous config saved to /var/cache/conftool/dbconfig/20200603-050911-marostegui.json
* 01:00 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: wgLocalVirtualHosts (duration: 01m 06s)
* 00:59 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|Ic27b605aec0118c5}} (duration: 01m 11s)


== 2020-06-02 ==
== 2022-09-21 ==
* 23:58 ejegg: updated fundraising CiviCRM from {{Gerrit|657c4b9455}} to {{Gerrit|eb156dffa4}}
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:55 ejegg: updated payments-wiki from {{Gerrit|1942a537ef}} to {{Gerrit|e46114d8b1}}
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:48 cstone: civicrm revision changed from {{Gerrit|d1cd99166f}} to {{Gerrit|657c4b9455}}
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:48 reedy@deploy1001: Synchronized wmf-config/interwiki-labs.php: laaaaabs (duration: 01m 05s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:23 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: beta apiportalwiki [[phab:T254185|T254185]] (duration: 01m 06s)
* 20:46 tgr_: UTC late deploys done
* 21:21 reedy@deploy1001: Synchronized wmf-config/config/apiportalwiki.yaml: beta apiportalwiki [[phab:T254185|T254185]] (duration: 01m 05s)
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:20 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta apiportalwiki [[phab:T254185|T254185]] (duration: 01m 05s)
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 21:19 reedy@deploy1001: Synchronized wikiversions-labs.json: beta apiportalwiki [[phab:T254185|T254185]] (duration: 01m 05s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:17 reedy@deploy1001: Synchronized dblists/all-labs.dblist: beta apiportalwiki [[phab:T254185|T254185]] (duration: 01m 06s)
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:12 cdanis: repooled wtp1032 [[phab:T254258|T254258]]
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:24 reedy@deploy1001: Synchronized composer.lock: Update (duration: 01m 06s)
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 20:02 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.35
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:59 jforrester@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.35 (duration: 93m 52s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:25 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.35
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:22 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.31 (duration: 19m 59s)
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:05 cdanis: fixing g+w permissions of deploy1001 /srv/mediawiki-staging/php-*/.git/objects/*
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 17:20 James_F: 1.35.0-wmf.35 was branched at {{Gerrit|8d7015037d44c4fe21eee8e8a040720b838bc169}} for [[phab:T253023|T253023]]
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:55 cstone: SmashPig revision changed from {{Gerrit|44690f761c}} to {{Gerrit|b9de3c7aac}}
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:57 ejegg: updated payments-wiki from {{Gerrit|d11efeb1cf}} to {{Gerrit|1942a537ef}}
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:50 cdanis: thumbor1003 and thumbor1004 blipped, no obvious explanation, logs gathered at P11365 P11366 P11367
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:49 XioNoX: push frack fw rules - [[phab:T254260|T254260]]
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 15:48 mutante: contint1001 - rm -rf /mnt/docker ([[phab:T224591|T224591]])
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:45 mutante: contint1001 - restarting docker afer changed data-root path ([[phab:T224591|T224591]])
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 15:37 cdanis@cumin1001: conftool action : set/pooled=no; selector: name=wtp1032.*
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:35 cdanis: power cycling wtp1032 which is bootlooping? https://phabricator.wikimedia.org/P11364
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:31 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 15:24 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor100[34].*
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:23 XioNoX: repool codfw - [[phab:T254216|T254216]]
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:19 XioNoX: rollback ospf changes - [[phab:T254216|T254216]]
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:09 hnowlan@deploy1001: Finished deploy [cpjobqueue/deploy@8a53ff1]: (no justification provided) (duration: 02m 33s)
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 15:07 XioNoX: reboot cr1-codfw:fpc5 - [[phab:T254216|T254216]]
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:06 hnowlan@deploy1001: Started deploy [cpjobqueue/deploy@8a53ff1]: (no justification provided)
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 15:05 hnowlan: shifting all high traffic cpjobqueue rules to k8s
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 14:57 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 14:57 XioNoX: depref ulsfo-codfw link - [[phab:T254216|T254216]]
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:51 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:50 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:49 jynus@cumin1001: START - Cookbook sre.hosts.downtime
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:49 XioNoX: prefer eqsin-ulsfo tunnel - [[phab:T254216|T254216]]
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 14:47 cdanis@cumin1001: conftool action : set/pooled=no; selector: name=thumbor100[34].*
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:38 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:31 XioNoX: depool codfw - [[phab:T254216|T254216]]
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 13:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 13:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 13:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 13:28 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 13:19 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:18 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:18 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:05 cdanis@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/ContribsPager.php: revert contribs limit to 5000 [[phab:T234450|T234450]] (duration: 00m 57s)
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 13:04 cdanis@deploy1001: Synchronized php-1.35.0-wmf.32/includes/specials/pagers/ContribsPager.php: revert contribs limit to 5000 [[phab:T234450|T234450]] (duration: 00m 57s)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:03 cdanis@deploy1001: Synchronized php-1.35.0-wmf.34/includes/specials/pagers/ContribsPager.php: revert contribs limit to 5000 [[phab:T234450|T234450]] (duration: 00m 58s)
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 12:59 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:56 cdanis@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: {{Gerrit|5debc3223}} limit per-user Special:Contributions concurrency to 2 [[phab:T234450|T234450]] (duration: 00m 58s)
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:50 kormat@cumin1001: dbctl commit (dc=all): 'Pool db2140 into s4 [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11363 and previous config saved to /var/cache/conftool/dbconfig/20200602-125012-kormat.json
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 12:39 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:31 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw217[3-9].codfw.wmnet
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:30 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2110, copy to db2140 complete [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11362 and previous config saved to /var/cache/conftool/dbconfig/20200602-123020-kormat.json
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw217[3-9].codfw.wmnet
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:10 kart_: Finished EU Mid-day SWAT.
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:08 mutante: contint1001 - common issue after reinstalls again - a2dismod mpm_event ; systemctl restart apache2 ; puppet agent -tv ( [[phab:T196968|T196968]])  https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 11:07 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}601174{{!}}Create URL campaign for African languages for COVID-19 translation project (T253305)]] (duration: 01m 00s)
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:01 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:48 mutante: LDAP - added uid=lulu to group nda ([[phab:T254121|T254121]])
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:29 akosiaris: switch over ores1XXX hosts to redis::misc from oresrdb hosts. [[phab:T254226|T254226]]
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:12 jynus: disable non-global root login to gerrit2001 [[phab:T254162|T254162]]
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 10:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1121, db1148 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11361 and previous config saved to /var/cache/conftool/dbconfig/20200602-101150-marostegui.json
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:09 akosiaris: switch over ores2XXX hosts to redis::misc from oresrdb hosts. [[phab:T254226|T254226]]
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1121, db1148 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11360 and previous config saved to /var/cache/conftool/dbconfig/20200602-100246-marostegui.json
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1121, db1148 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11359 and previous config saved to /var/cache/conftool/dbconfig/20200602-095321-marostegui.json
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P11358 and previous config saved to /var/cache/conftool/dbconfig/20200602-094914-marostegui.json
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1121, db1148 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11357 and previous config saved to /var/cache/conftool/dbconfig/20200602-094441-marostegui.json
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1148 to dbctl depooled [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11356 and previous config saved to /var/cache/conftool/dbconfig/20200602-093841-marostegui.json
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:59 ema: upload purged 0.15 to buster-wikimedia
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:09 mutante: re-imaging contint1001 with buster
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:43 marostegui: Stop MySQL on db1121
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 to clone db1148', diff saved to https://phabricator.wikimedia.org/P11353 and previous config saved to /var/cache/conftool/dbconfig/20200602-074027-marostegui.json
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after data check', diff saved to https://phabricator.wikimedia.org/P11351 and previous config saved to /var/cache/conftool/dbconfig/20200602-073245-marostegui.json
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:22 marostegui: Stop slave on db1079 for data check
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for data check', diff saved to https://phabricator.wikimedia.org/P11350 and previous config saved to /var/cache/conftool/dbconfig/20200602-072214-marostegui.json
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:06 marostegui: Stop MySQL and poweroff on db1138 for on-site maintenance - [[phab:T253808|T253808]]
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:01 marostegui: Stop mysql on db1141 to save a binary backup - [[phab:T249188|T249188]]
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:03 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|I06897bcc92c5}} (duration: 00m 59s)
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul


== 2020-06-01 ==
== 2022-09-20 ==
* 20:14 shdubsh: downgrade mtail to rc5 in ulsfo -- [[phab:T254192|T254192]]
* 20:19 cjming: end of UTC late backport window
* 20:12 XioNoX: enable IX4/6 on cr4-ulsfo - [[phab:T237575|T237575]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:57 XioNoX: disable IX4/6 on cr4-ulsfo - [[phab:T237575|T237575]]
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 19:55 XioNoX: fail vrrp over cr3-ulsfo - [[phab:T237575|T237575]]
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:44 shdubsh: restart atsmtail in eqsin
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:21 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:570395{{!}}Enable kask-transition for all wikis]] (duration: 01m 00s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:59 XioNoX: offline cr1-codfw:fpc0 - [[phab:T254110|T254110]]
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 17:47 XioNoX: turn online cr1-codfw:fpc0 - [[phab:T254110|T254110]]
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 17:46 shdubsh: update mtail in ulsfo caching hosts. restarting atsmtail and varnishmtail
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 17:31 mutante: backup1001 - queued job 42 - gerrit backup after renaming of the file set and addition of LFS data ([[phab:T254155|T254155]], [[phab:T254162|T254162]]) it is incremental, the full one already ran
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 16:49 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging - fix searchsatisfaction schema URI - testwiki only - [[phab:T249261|T249261]] (duration: 00m 59s)
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 16:48 otto@deploy1001: sync-file aborted: EventLogging - fix searchsatisfaction schema URI - testwiki only - [[phab:T249261|T249261]] (duration: 00m 02s)
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 16:39 bstorm_: running view updates on db1141 [[phab:T252219|T252219]]
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 14:53 elukey: ganeti: increase memory available for an-launcher1001 from 8g to 12g - [[phab:T254125|T254125]]
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 14:44 volans: deploying ulsfo mgmt DNS records automatically generated by Netbox ( operations/dns/+/585545/ ) - [[phab:T233183|T233183]]
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1142, db1147 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11345 and previous config saved to /var/cache/conftool/dbconfig/20200601-120000-marostegui.json
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1142, db1147 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11344 and previous config saved to /var/cache/conftool/dbconfig/20200601-114440-marostegui.json
* 18:50 jynus: restart db2100:s7 to apply new config
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1142, db1147 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11343 and previous config saved to /var/cache/conftool/dbconfig/20200601-113032-marostegui.json
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 10:49 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:601328{{!}} Bumping portals to master (601328)]] (duration: 00m 59s)
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 10:48 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:601328{{!}} Bumping portals to master (601328)]] (duration: 01m 03s)
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 09:37 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 09:30 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 09:26 jynus: reenabling puppet on all db/es/pc hosts after deploy of gerrit:599596
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1142, db1147 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11342 and previous config saved to /var/cache/conftool/dbconfig/20200601-092220-marostegui.json
* 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1147 to dbctl, depooled [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11341 and previous config saved to /var/cache/conftool/dbconfig/20200601-091809-marostegui.json
* 18:45 cstone: payments-wiki upgraded from {{Gerrit|de4b2bb9}} to {{Gerrit|0456850e}}
* 09:06 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 09:05 filippo@cumin1001: START - Cookbook sre.hosts.decommission
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:05 XioNoX: offline cr1-codfw:fpc0 - [[phab:T254110|T254110]]
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:05 filippo@cumin1001: START - Cookbook sre.hosts.decommission
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 09:03 filippo@cumin1001: START - Cookbook sre.hosts.decommission
* 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 08:58 godog: prometheus eqiad lvextend --resizefs --size +100G vg-ssd/prometheus-ops
* 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 08:43 mutante: deneb - apt-get remove --purge apt-listchanges (packages was in status "rc" causing DPKG alert, should be removed but config was not purged)
* 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 08:41 mutante: deneb - apt-get remove python3-debconf (package was in status "ri" causing DPKG icinga alert. ri means it should be removed but is not)
* 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 08:33 XioNoX: restart cr1-codfw:fpc0 - [[phab:T254110|T254110]]
* 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 08:22 mutante: mw1331 re-enabled puppet (SAL told me about an experiment a little while ago)
* 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 08:19 jynus: disabling puppet on all db/es/pc hosts for deploy of gerrit:599596
* 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 to clone db1147 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11339 and previous config saved to /var/cache/conftool/dbconfig/20200601-070519-marostegui.json
* 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool enwiki db2071 slave to test new index - [[phab:T238966|T238966]]', diff saved to https://phabricator.wikimedia.org/P11338 and previous config saved to /var/cache/conftool/dbconfig/20200601-050354-marostegui.json
* 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 04:54 marostegui: Drop testreduce_0715 from m5 master [[phab:T245408|T245408]]
* 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 04:44 marostegui: Depool db1141 from Analytics role - [[phab:T249188|T249188]]
* 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 16:42 dancy@deploy1002: Sync cancelled.
* 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:41 dancy@deploy1002: Started scap: testing, disregard
* 16:09 awight@deploy1002: backport aborted:  (duration: 00m 33s)
* 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (take 2) (duration: 03m 42s)
* 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (duration: 03m 53s)
* 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
* 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: {{Gerrit|1ac09d4709c645558f644a885fadc49c05cc04b9}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 39s)
* 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: {{Gerrit|1a27e05a7ca53a063d5f9e284d6a09546ac8691c}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 52s)
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
* 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0b55db6f80df5f4c89f969332a6b31077a7172c4}}: Enable Tech Wishes survey on dewiki ([[phab:T316676|T316676]]) (duration: 04m 12s)
* 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
* 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
* 08:35 hashar: Restarted CI Jenkins for plugin update
* 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832993{{!}}testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289)]] (duration: 03m 46s)
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:10 kart_: Updated cxserver to 2022-09-15-113346-production ([[phab:T317289|T317289]], [[phab:T315209|T315209]])
* 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 36m 08s)
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-05-31 ==
== 2022-09-19 ==
* 09:56 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Vox Golf' 'Colonel Chicken' ([[phab:T254068|T254068]])
* 22:59 ebernhardson: [[phab:T317200|T317200]] start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic
* 21:21 maryum: Deployed security patch for [[phab:T302479|T302479]]
* 21:21 mstyles@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s)
* 21:15 sbassett: Deployed security patch for [[phab:T312820|T312820]]
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 cjming: end of UTC late backport window
* 20:59 ebernhardson@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: [[gerrit:833031{{!}}Add token_count subfield to outgoing_link (T317546)]] (duration: 03m 51s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:21 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820459{{!}}Wikifunctions: Drop two config items moved to docker]] (duration: 03m 38s)
* 20:21 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:16 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:829877{{!}}ExtensionDistributor: Add REL1_39 (T313925)]] (duration: 03m 38s)
* 20:12 cjming@deploy1002: Finished scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] (duration: 06m 31s)
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 cjming@deploy1002: cjming and arlolra: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:06 cjming@deploy1002: Started scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]]
* 19:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 19:33 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:30 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 17:43 dancy@deploy1002: Installation of scap version "4.21.0" completed for 561 hosts
* 17:42 dancy@deploy1002: Installing scap version "4.21.0" for 561 hosts
* 17:36 dancy@deploy1002: Sync cancelled.
* 17:36 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:36 dancy@deploy1002: Started scap: testing, disregard
* 14:03 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ukwikivoyage<nowiki>{</nowiki>.png,-1.5x.png,-2x.png<nowiki>}</nowiki> ([[phab:T317718|T317718]])
* 14:02 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|6c7151d969b6997bd9cce042b7bc78c282dd9b26}}: Regenerate ukwikivoyage logo ([[phab:T317718|T317718]]) (duration: 03m 46s)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cbf161d148228e0e706813f923ab1a5d4b42757a}}: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro ([[phab:T314518|T314518]]) (duration: 04m 01s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a6c1ddf5cd1a46ab05f5d6fda4b938a3ee37238}}: Remove unnecessary wgNamespaceAliases from bnwiki ([[phab:T318003|T318003]]) (duration: 04m 16s)
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-05-29 ==
== 2022-09-17 ==
* 22:32 bstorm_: updated views on labsdb1010 [[phab:T252219|T252219]]
* 12:17 Emperor: set thanos ring replicas to 3.80 [[phab:T311690|T311690]]
* 20:55 bstorm_: updating views on labsdb1011 [[phab:T252219|T252219]]
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34879 and previous config saved to /var/cache/conftool/dbconfig/20220917-103903-ladsgroup.json
* 19:27 ryankemper: Successfully finished a rolling restart of the `cloudelastic` clusters (chi, psi, omega) as part of elasticsearch plugins upgrade. Host and service checks re-enabled.
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34878 and previous config saved to /var/cache/conftool/dbconfig/20220917-102356-ladsgroup.json
* 17:28 bstorm_: updating views on labsdb1009 [[phab:T252219|T252219]]
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34877 and previous config saved to /var/cache/conftool/dbconfig/20220917-100850-ladsgroup.json
* 16:50 ryankemper: Performing a rolling restart of the `cloudelastic` clusters (chi, psi, omega) as part of elasticsearch plugins upgrade. Host and service checks disabled.
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34876 and previous config saved to /var/cache/conftool/dbconfig/20220917-095344-ladsgroup.json
* 16:00 bstorm_: Updating views on labsdb1012 [[phab:T252219|T252219]]
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34875 and previous config saved to /var/cache/conftool/dbconfig/20220917-094856-ladsgroup.json
* 15:59 ryankemper: Concluded rolling restart of the `relforge` clusters as part of elasticsearch plugins upgrade. Both hosts `relforge1001` and `relforge1002` are back up. Downtime lifted.
* 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34874 and previous config saved to /var/cache/conftool/dbconfig/20220917-093349-ladsgroup.json
* 15:29 ryankemper: Performing a rolling restart of the `relforge` clusters as part of elasticsearch plugins upgrade
* 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34873 and previous config saved to /var/cache/conftool/dbconfig/20220917-091843-ladsgroup.json
* 14:59 cdanis: disabling puppet on netflow* to deploy {{Gerrit|Ic71e96f0}} [[phab:T253128|T253128]]
* 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34872 and previous config saved to /var/cache/conftool/dbconfig/20220917-090336-ladsgroup.json
* 14:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34871 and previous config saved to /var/cache/conftool/dbconfig/20220917-074806-ladsgroup.json
* 14:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34870 and previous config saved to /var/cache/conftool/dbconfig/20220917-073300-ladsgroup.json
* 14:41 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34869 and previous config saved to /var/cache/conftool/dbconfig/20220917-071753-ladsgroup.json
* 14:41 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34868 and previous config saved to /var/cache/conftool/dbconfig/20220917-070247-ladsgroup.json
* 14:35 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34867 and previous config saved to /var/cache/conftool/dbconfig/20220917-051719-ladsgroup.json
* 14:35 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 14:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 14:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34866 and previous config saved to /var/cache/conftool/dbconfig/20220917-051527-ladsgroup.json
* 14:15 mdholloway: ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on commonswiki ([[phab:T253821|T253821]])
* 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 12:49 hnowlan: reimaging restbase2009 after disk replacement
* 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 12:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34865 and previous config saved to /var/cache/conftool/dbconfig/20220917-051203-ladsgroup.json
* 12:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:15 godog: roll-restart to upgrade thanos to 0.13.0rc0 - [[phab:T252186|T252186]] [[phab:T233956|T233956]]
* 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 11:32 moritzm: installing cups security updates (client-side libs/tools)
* 11:01 ema: upload prometheus-rdkafka-exporter 0.2 to buster-wikimedia [[phab:T253551|T253551]]
* 10:53 moritzm: updating mwdebug2002 to 7.2.31
* 10:02 marostegui: Compress InnoDB on db1138 [[phab:T232446|T232446]]
* 08:30 godog: update swift uid/gid on thanos hosts - [[phab:T123918|T123918]]
* 08:04 mutante: phabricator - restarted apache2 - back for me now
* 08:03 XioNoX: add new AMS-IX link to LACP bundle
* 08:01 mutante: phabricator - broken due to "PhabricatorRepositoryMirrorEngine::pushToGitRepository" starting git process that uses 100% CPU, stopped phd service
* 07:56 mutante: phabricator - killed pid 25070 (git) which used 100% of CPU, restarted phd service
* 07:25 moritzm: updating perf on buster systems to new version from 10.4 point release
* 07:15 moritzm: installing el-api update from latest Buster point release
* 07:12 moritzm: installing xdg-utils update from latest Buster point release
* 07:11 mutante: mw1293 (canary jobrunner ) replace apache2.conf with version from mwdebug1001, restart apache, to debug for [[phab:T190111|T190111]]
* 07:00 moritzm: installing rake security updates
* 06:36 mutante: deneb - systemctl start docker-reporter-releng-images
* 05:20 marostegui: Deploy schema change on db1138 (no longer s4 master) - [[phab:T250055|T250055]]
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1081 to s4 master and remove read-only from s4 [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11334 and previous config saved to /var/cache/conftool/dbconfig/20200529-050224-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11333 and previous config saved to /var/cache/conftool/dbconfig/20200529-050153-marostegui.json
* 05:00 marostegui: Starting s4 failover from db1138 to db1081 -[[phab:T253808|T253808]]
* 04:25 marostegui: Start topology changes in s4 - [[phab:T253808|T253808]]


== 2020-05-28 ==
== 2022-09-16 ==
* 23:48 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/skins/Vector/resources/skins.vector.styles/Menu.less: [[phab:T253912|T253912]] Hotfix: Cannot rename emptyPortlet to empty-portlet yet (duration: 00m 59s)
* 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:41 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/WikibaseMediaInfo/src/Services/FilePageLookup.php: [[phab:T253792|T253792]] Follow-up {{Gerrit|1827c7a}}: Ensure inNamespace() is called only on Title object (duration: 00m 58s)
* 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T253821|T253821]] Update MachineVision block list for 2020-05-27 (duration: 00m 57s)
* 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34864 and previous config saved to /var/cache/conftool/dbconfig/20220916-212905-ladsgroup.json
* 22:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Move one CheckUser right change next to the other (duration: 00m 57s)
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34863 and previous config saved to /var/cache/conftool/dbconfig/20220916-211358-ladsgroup.json
* 22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove version wrapper around wgOverrideUcfirstCharacters; always true (duration: 00m 59s)
* 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34862 and previous config saved to /var/cache/conftool/dbconfig/20220916-205852-ladsgroup.json
* 21:48 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.34
* 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34861 and previous config saved to /var/cache/conftool/dbconfig/20220916-204345-ladsgroup.json
* 21:26 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/includes/filerepo/FileRepo.php: [[phab:T253922|T253922]] Mark two FileRepo functions public (duration: 01m 07s)
* 19:16 mutante: cp1081 /usr/local/sbin/update-ocsp-all
* 21:12 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/includes/specials/SpecialUserrights.php: [[phab:T253909|T253909]] Restore visibility (previously implicitely public) (duration: 01m 06s)
* 17:01 mutante: gitlab-runner*: deployed gerrit:832584 and systemctl restart buildkitd on 6 hosts for [[phab:T317904|T317904]]
* 20:38 jforrester@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/resources/skins.vector.styles: [[phab:T253905|T253905]] HOTFIX: Do not apply p-personal absolute positioning to all menus (duration: 01m 07s)
* 16:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 20:22 shdubsh: restart varnishmtail and atsmtail eqsin
* 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 20:11 shdubsh: restart ncredirmtail on ncredir5001
* 16:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 19:20 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back the train due to [[phab:T253905|T253905]]
* 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 19:20 twentyafterfour: group2 back to wmf.32 due to [[phab:T253905|T253905]]
* 16:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 19:20 milimetric@deploy1001: Finished deploy [analytics/refinery@f6d73c8] (thin): Hotfix #2 today (thin): forgot jars [analytics/refinery@f6d73c8] (duration: 00m 09s)
* 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 19:20 milimetric@deploy1001: Started deploy [analytics/refinery@f6d73c8] (thin): Hotfix #2 today (thin): forgot jars [analytics/refinery@f6d73c8]
* 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:17 milimetric@deploy1001: Finished deploy [analytics/refinery@f6d73c8]: Hotfix #2 today: forgot jars [analytics/refinery@f6d73c8] (duration: 16m 54s)
* 16:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]]
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184
* 19:01 shdubsh: restart varnishmtail and atsmtail on cp5001.eqsin.wmnet
* 16:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2184
* 19:00 milimetric@deploy1001: Started deploy [analytics/refinery@f6d73c8]: Hotfix #2 today: forgot jars [analytics/refinery@f6d73c8]
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2183
* 17:03 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]] (duration: 01m 06s)
* 16:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2183
* 17:02 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]]
* 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34860 and previous config saved to /var/cache/conftool/dbconfig/20220916-161409-ladsgroup.json
* 16:32 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/Wikibase: [[phab:T253804|T253804]] Use ThrowingEntityTermStoreWriter when writers shouldn't be called (duration: 01m 15s)
* 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 15:37 milimetric@deploy1001: Finished deploy [analytics/refinery@203d182] (thin): Three hotfixes (THIN) [analytics/refinery@203d182] (duration: 00m 10s)
* 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 15:37 milimetric@deploy1001: Started deploy [analytics/refinery@203d182] (thin): Three hotfixes (THIN) [analytics/refinery@203d182]
* 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34859 and previous config saved to /var/cache/conftool/dbconfig/20220916-161346-ladsgroup.json
* 15:05 milimetric@deploy1001: Finished deploy [analytics/refinery@203d182]: Three hotfixes [analytics/refinery@203d182] (duration: 25m 59s)
* 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34858 and previous config saved to /var/cache/conftool/dbconfig/20220916-155840-ladsgroup.json
* 15:02 moritzm: installing exim4 security updates on jessie (stretch/buster already fixed)
* 15:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:39 milimetric@deploy1001: Started deploy [analytics/refinery@203d182]: Three hotfixes [analytics/refinery@203d182]
* 15:52 dancy@deploy1002: Installation of scap version "4.20.0" completed for 561 hosts
* 14:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 dancy@deploy1002: Installing scap version "4.20.0" for 561 hosts
* 14:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:01 ema: atskafka 0.8 uploaded to buster-wikimedia [[phab:T253551|T253551]]
* 15:44 dancy@deploy1002: Finished scap: testing (duration: 04m 53s)
* 13:49 godog: roll-restart prometheus k8s-staging to enable thanos upload - [[phab:T252186|T252186]]
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34857 and previous config saved to /var/cache/conftool/dbconfig/20220916-154333-ladsgroup.json
* 13:36 hashar: Restarting CI Jenkins for plugin rollback
* 15:39 dancy@deploy1002: Started scap: testing
* 11:49 moritzm: installing unbound security updates
* 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34856 and previous config saved to /var/cache/conftool/dbconfig/20220916-152827-ladsgroup.json
* 11:03 kormat@cumin1001: dbctl commit (dc=all): 'Add db2138 to s2+s4 [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11330 and previous config saved to /var/cache/conftool/dbconfig/20200528-110333-kormat.json
* 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 10:36 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 10:34 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 10:30 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 15:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 10:02 mutante: gerrit1002 (test server) - chown -R gerrit2:gerrit2 /var/lib/gerrit/review_site ; restarted gerrit service, now the service is not in restart loop anymore, gerrit-ssh is listening too, just not accepting publickey ([[phab:T239151|T239151]])
* 15:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 09:51 XioNoX: failover VRRP in ulsfo
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:41 XioNoX: re-activate peering/transit on cr2-eqdfw - [[phab:T243080|T243080]]
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:35 mutante: restarting gerrit on gerrit1002 after fixing db_pass to the readonly one ([[phab:T243800|T243800]])
* 15:01 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:33 XioNoX: restart cr2-eqdfw for upgrade - [[phab:T243080|T243080]]
* 15:01 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:30 XioNoX: deactivate peering/transit on cr2-eqdfw - [[phab:T243080|T243080]]
* 15:01 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:25 _joe_: updating ACLs on all etcd servers
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:22 XioNoX: install new Junos on cr2-eqdfw - [[phab:T243080|T243080]]
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 09:16 XioNoX: rollback cr2-eqord ospf/bgp - [[phab:T243080|T243080]]
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 09:07 XioNoX: restart cr2-eqord for upgrade - [[phab:T243080|T243080]]
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 09:05 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 14:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 08:50 _joe_: upgrading etcd ACLs (adding new users) to conf1004
* 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 08:50 XioNoX: install new Junos on cr2-eqord - [[phab:T243080|T243080]]
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 08:46 XioNoX: deactivate peering/transit on cr2-eqord - [[phab:T243080|T243080]]
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 08:45 XioNoX: de-pref all OSPF links to cr2-eqord - [[phab:T243080|T243080]]
* 14:42 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:13 marostegui: Pool db1141 into labsdb analytics role - [[phab:T249188|T249188]]
* 14:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 07:33 gilles@deploy1001: Synchronized static/images: [[phab:T252108|T252108]] Deploying optimised static PNGs (duration: 01m 39s)
* 14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 07:31 gilles@deploy1001: Synchronized static/apple-touch: [[phab:T252108|T252108]] Deploying optimised static PNGs (duration: 01m 12s)
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1081 from API and set its weight to 0 on main traffic - preparation for tomorrow's failover [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11329 and previous config saved to /var/cache/conftool/dbconfig/20200528-063037-marostegui.json
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 04:44 marostegui: Run check_private data on db1141 - [[phab:T249188|T249188]]
* 14:17 godog: add 100G to prometheus/eqiad instance k8s-mlserve
* 04:22 marostegui: Stop MySQL on db1141 - [[phab:T249188|T249188]]
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34855 and previous config saved to /var/cache/conftool/dbconfig/20220916-131902-root.json
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34854 and previous config saved to /var/cache/conftool/dbconfig/20220916-130357-root.json
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34853 and previous config saved to /var/cache/conftool/dbconfig/20220916-125841-root.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34852 and previous config saved to /var/cache/conftool/dbconfig/20220916-124850-root.json
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34851 and previous config saved to /var/cache/conftool/dbconfig/20220916-124336-root.json
* 12:43 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34850 and previous config saved to /var/cache/conftool/dbconfig/20220916-123346-root.json
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34849 and previous config saved to /var/cache/conftool/dbconfig/20220916-122831-root.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34848 and previous config saved to /var/cache/conftool/dbconfig/20220916-121841-root.json
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34847 and previous config saved to /var/cache/conftool/dbconfig/20220916-121326-root.json
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34846 and previous config saved to /var/cache/conftool/dbconfig/20220916-120336-root.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34845 and previous config saved to /var/cache/conftool/dbconfig/20220916-115821-root.json
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34844 and previous config saved to /var/cache/conftool/dbconfig/20220916-114935-root.json
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34843 and previous config saved to /var/cache/conftool/dbconfig/20220916-114831-root.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34842 and previous config saved to /var/cache/conftool/dbconfig/20220916-114316-root.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P34841 and previous config saved to /var/cache/conftool/dbconfig/20220916-113543-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34840 and previous config saved to /var/cache/conftool/dbconfig/20220916-113431-root.json
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34839 and previous config saved to /var/cache/conftool/dbconfig/20220916-113325-root.json
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P34838 and previous config saved to /var/cache/conftool/dbconfig/20220916-112750-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34837 and previous config saved to /var/cache/conftool/dbconfig/20220916-111925-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34836 and previous config saved to /var/cache/conftool/dbconfig/20220916-110420-root.json
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34835 and previous config saved to /var/cache/conftool/dbconfig/20220916-105819-ladsgroup.json
* 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34834 and previous config saved to /var/cache/conftool/dbconfig/20220916-105809-ladsgroup.json
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34832 and previous config saved to /var/cache/conftool/dbconfig/20220916-104916-root.json
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34831 and previous config saved to /var/cache/conftool/dbconfig/20220916-104303-ladsgroup.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34830 and previous config saved to /var/cache/conftool/dbconfig/20220916-103411-root.json
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34829 and previous config saved to /var/cache/conftool/dbconfig/20220916-102756-ladsgroup.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34828 and previous config saved to /var/cache/conftool/dbconfig/20220916-101905-root.json
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34827 and previous config saved to /var/cache/conftool/dbconfig/20220916-101250-ladsgroup.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34826 and previous config saved to /var/cache/conftool/dbconfig/20220916-100400-root.json
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34825 and previous config saved to /var/cache/conftool/dbconfig/20220916-093635-root.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34824 and previous config saved to /var/cache/conftool/dbconfig/20220916-093121-root.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34823 and previous config saved to /var/cache/conftool/dbconfig/20220916-092130-root.json
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34822 and previous config saved to /var/cache/conftool/dbconfig/20220916-091616-root.json
* 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34821 and previous config saved to /var/cache/conftool/dbconfig/20220916-091234-ladsgroup.json
* 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34820 and previous config saved to /var/cache/conftool/dbconfig/20220916-090625-root.json
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34819 and previous config saved to /var/cache/conftool/dbconfig/20220916-090111-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34818 and previous config saved to /var/cache/conftool/dbconfig/20220916-085120-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34817 and previous config saved to /var/cache/conftool/dbconfig/20220916-084607-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34816 and previous config saved to /var/cache/conftool/dbconfig/20220916-083615-root.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34815 and previous config saved to /var/cache/conftool/dbconfig/20220916-083102-root.json
* 08:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34814 and previous config saved to /var/cache/conftool/dbconfig/20220916-082110-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34813 and previous config saved to /var/cache/conftool/dbconfig/20220916-081557-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34812 and previous config saved to /var/cache/conftool/dbconfig/20220916-080605-root.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34811 and previous config saved to /var/cache/conftool/dbconfig/20220916-080052-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34810 and previous config saved to /var/cache/conftool/dbconfig/20220916-075100-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34809 and previous config saved to /var/cache/conftool/dbconfig/20220916-074548-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34808 and previous config saved to /var/cache/conftool/dbconfig/20220916-074251-root.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P34807 and previous config saved to /var/cache/conftool/dbconfig/20220916-072958-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34806 and previous config saved to /var/cache/conftool/dbconfig/20220916-072746-root.json
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34805 and previous config saved to /var/cache/conftool/dbconfig/20220916-071241-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34804 and previous config saved to /var/cache/conftool/dbconfig/20220916-065737-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34803 and previous config saved to /var/cache/conftool/dbconfig/20220916-064232-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34802 and previous config saved to /var/cache/conftool/dbconfig/20220916-062727-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34801 and previous config saved to /var/cache/conftool/dbconfig/20220916-061222-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34800 and previous config saved to /var/cache/conftool/dbconfig/20220916-055717-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34799 and previous config saved to /var/cache/conftool/dbconfig/20220916-055542-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34798 and previous config saved to /var/cache/conftool/dbconfig/20220916-055424-root.json
* 05:51 marostegui: Install 10.6 on db1168 [[phab:T301879|T301879]]
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34797 and previous config saved to /var/cache/conftool/dbconfig/20220916-055031-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P34795 and previous config saved to /var/cache/conftool/dbconfig/20220916-054438-root.json
* 01:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
* 01:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 01:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
* 01:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 00:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 17s)
* 00:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)


== 2020-05-27 ==
== 2022-09-15 ==
* 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add autoreviewrestore right to rollbacker group on hiwiki ([[phab:T252986|T252986]]) (duration: 01m 05s)
* 23:51 mutante: gerrit1001 - disabled puppet - gerrit:832411
* 23:16 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add thwiki Draft namespace to wmgExemptFromUserRobotsControlExtra and enable VE there ([[phab:T252959|T252959]]) (duration: 01m 06s)
* 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 22:58 gehel@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 22:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 22:02 crusnov@deploy1001: Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4) (duration: 00m 10s)
* 21:30 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 22:02 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4)
* 20:25 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] (duration: 07m 06s)
* 22:01 crusnov@deploy1001: Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3) (duration: 01m 29s)
* 20:18 thcipriani@deploy1002: thcipriani and dani: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 22:00 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3)
* 20:18 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]]
* 22:00 crusnov@deploy1001: deploy aborted: Netbox Upgrade to 2.8.4 (part2) (duration: 01m 31s)
* 20:15 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] (duration: 07m 39s)
* 21:58 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part2)
* 20:08 thcipriani@deploy1002: thcipriani and dcausse: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 21:58 crusnov@deploy1001: Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1) (duration: 01m 01s)
* 20:07 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]]
* 21:57 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1)
* 19:26 ebernhardson: pool'd wdqs2001, some blockers before reload can start [[phab:T316236|T316236]]
* 20:43 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 18:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.1 refs [[phab:T314190|T314190]]
* 20:28 marostegui: Decrease  innodb poolsize on s4 master and restart mysql
* 18:39 dancy@deploy1002: Finished scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] (duration: 09m 53s)
* 20:11 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@9dc827f]: Update mobileapps to {{Gerrit|b3b9214c}} ([[phab:T253648|T253648]]) (duration: 03m 31s)
* 18:38 cwhite: restart thanos-compact (thanos-fe2001) and swift_ring_manager (thanos-fe1001)
* 20:08 mbsantos@deploy1001: Started deploy [mobileapps/deploy@9dc827f]: Update mobileapps to {{Gerrit|b3b9214c}} ([[phab:T253648|T253648]])
* 18:29 dancy@deploy1002: dancy and cscott: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:04 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.32  refs [[phab:T253022|T253022]] (duration: 01m 04s)
* 18:29 dancy@deploy1002: Started scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]]
* 20:03 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.32 refs [[phab:T253022|T253022]]
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2003.codfw.wmnet on all recursors
* 20:00 gehel@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2003.codfw.wmnet on all recursors
* 19:56 twentyafterfour@deploy1001: scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 19:46 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/includes/parser/CoreParserFunctions.php: [[phab:T253725|T253725]] Partially revert 'Fix impedance mismatch with Parser::getRevisionRecordObject()' (duration: 01m 05s)
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 19:12 joal@deploy1001: Finished deploy [analytics/refinery@8a3dcb3]: Analytics regular weekly train (an-launcher1001 only) [{{Gerrit|8a3dcb3}}] (duration: 06m 07s)
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2001.codfw.wmnet on all recursors
* 19:09 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: [[phab:T32405|T32405]] Stop special casing the main page on mobile for twelve wikis (duration: 01m 05s)
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2001.codfw.wmnet on all recursors
* 19:06 joal@deploy1001: Started deploy [analytics/refinery@8a3dcb3]: Analytics regular weekly train (an-launcher1001 only) [{{Gerrit|8a3dcb3}}]
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1003.eqiad.wmnet on all recursors
* 19:03 joal@deploy1001: Finished deploy [analytics/refinery@8a3dcb3] (thin): Analytics regular weekly train THIN [{{Gerrit|8a3dcb3}}] (duration: 00m 08s)
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1003.eqiad.wmnet on all recursors
* 19:03 joal@deploy1001: Started deploy [analytics/refinery@8a3dcb3] (thin): Analytics regular weekly train THIN [{{Gerrit|8a3dcb3}}]
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1002.eqiad.wmnet on all recursors
* 19:03 joal@deploy1001: Finished deploy [analytics/refinery@8a3dcb3]: Analytics regular weekly train [{{Gerrit|8a3dcb3}}] (duration: 21m 20s)
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1002.eqiad.wmnet on all recursors
* 18:41 joal@deploy1001: Started deploy [analytics/refinery@8a3dcb3]: Analytics regular weekly train [{{Gerrit|8a3dcb3}}]
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1001.eqiad.wmnet on all recursors
* 18:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable DiscussionTools as beta on mediawiki.org, part II [[phab:T251208|T251208]] (duration: 01m 05s)
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1001.eqiad.wmnet on all recursors
* 17:56 jayme: updated tiller to 2.16.7-wmf1 for all services in kubernetes cluster: eqiad
* 18:15 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 17:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable DiscussionTools as beta on mediawiki.org [[phab:T251208|T251208]] (duration: 01m 05s)
* 18:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:42 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 18:13 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 17:40 gehel: repool maps2003
* 18:07 godog: restart envoyproxy on thanos-fe*
* 17:32 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/Translate/: Deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Translate/+/599027/ to wmf.34 refs [[phab:T253748|T253748]] and [[phab:T253022|T253022]] (duration: 01m 07s)
* 18:06 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 16:55 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:06 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 16:53 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:26 hnowlan@deploy1001: Finished deploy [cpjobqueue/deploy@c8c653e]: Disabling ThumbnailRender as a test of k8s cpjobqueue (duration: 01m 57s)
* 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging BryanDavis out of all services on: 2047 hosts
* 16:24 hnowlan@deploy1001: Started deploy [cpjobqueue/deploy@c8c653e]: Disabling ThumbnailRender as a test of k8s cpjobqueue
* 16:16 andrew@cumin1001: START - Cookbook sre.idm.logout Logging BryanDavis out of all services on: 2047 hosts
* 16:10 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:39 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:09 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:37 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 16:06 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 15:52 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop defining wmgUsePerformanceInspector, unread [[phab:T253689|T253689]] (duration: 01m 04s)
* 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 15:52 gehel@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 15:52 jayme: updated tiller to 2.16.7-wmf1 for all services in kubernetes cluster: codfw
* 15:22 hnowlan: starting cassandra on sessionstore1001-a
* 15:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop loading PerformanceInspector on any wiki [[phab:T253689|T253689]] (duration: 01m 06s)
* 15:18 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 15:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34792 and previous config saved to /var/cache/conftool/dbconfig/20220915-151131-ladsgroup.json
* 15:02 godog: eqiad-prod: decom ms-be101[678] - [[phab:T252008|T252008]]
* 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34791 and previous config saved to /var/cache/conftool/dbconfig/20220915-145625-ladsgroup.json
* 14:58 jayme: updated tiller to 2.16.7-wmf1 for all services in cluster: staging
* 14:41 moritzm: installing libtirpc security updates
* 14:58 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34790 and previous config saved to /var/cache/conftool/dbconfig/20220915-144118-ladsgroup.json
* 14:56 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34789 and previous config saved to /var/cache/conftool/dbconfig/20220915-142612-ladsgroup.json
* 14:56 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:01 sukhe: retarting bird.service on A:dns-auth for zlib update
* 14:54 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6b9784a0708cf1e7762034ccfba7e5604b2f6dc2}}: Enable the Vue version of the mentee overview in pilot wikis ([[phab:T300532|T300532]]) (duration: 03m 45s)
* 14:52 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:58 aqu@deploy1002: Finished deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d] (duration: 00m 09s)
* 14:51 cdanis: cumin1001: upgrading python3-conftool and python3-conftool-dbctl
* 13:58 aqu@deploy1002: Started deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d]
* 14:50 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:57 sukhe: retarting haproxy.service on A:dns-auth for zlib update
* 14:50 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:57 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d] (duration: 00m 10s)
* 14:46 cdanis: cumin2001: upgrading python3-conftool and python3-conftool-dbctl
* 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d]
* 14:43 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:51 jayme: updated rsyslog to 8.2208.0-1~bpo11+1 on all kubernetes masters and nodes - [[phab:T289766|T289766]]
* 14:43 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:40 cdanis: reprepro: upload conftool_1.3.1-1<nowiki>{</nowiki>,+deb10u1<nowiki>}</nowiki> to <nowiki>{</nowiki>stretch,buster<nowiki>}</nowiki>-wikimedia
* 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383] (duration: 06m 01s)
* 14:36 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:36 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:32 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:32 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:41 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383]
* 14:30 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:38 sukhe: restarting bird.service on A:dns-rec for zlib update
* 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1146:3312, db1146:3314 and db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P11318 and previous config saved to /var/cache/conftool/dbconfig/20200527-141635-marostegui.json
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:13 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:13 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:07 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:07 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:33 sukhe: restarting pdns-recursor on A:dns-rec for zlib update
* 14:04 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/GrowthExperiments/: {{Gerrit|f592e85858d17a2de99cde93627054ee4972c2bd}}: Mentee overview: avoid requiring the non-vue mentee overview script when loading the Vue one ([[phab:T300532|T300532]]) (duration: 04m 05s)
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1146:3312, db1146:3314 and db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P11317 and previous config saved to /var/cache/conftool/dbconfig/20200527-140442-marostegui.json
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:03 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:03 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 12:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 13:58 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 13:51 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 12:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1001.eqiad.wmnet with OS buster
* 13:51 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:17 jayme: fleet wide update of prometheus-rsyslog-exporter to 0.0.0+git20201008-4 - [[phab:T289766|T289766]]
* 13:48 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 13:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 13:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34787 and previous config saved to /var/cache/conftool/dbconfig/20220915-120013-root.json
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1146:3312, db1146:3314 and db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P11316 and previous config saved to /var/cache/conftool/dbconfig/20200527-134704-marostegui.json
* 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 13:45 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:50 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 13:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 13:36 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34786 and previous config saved to /var/cache/conftool/dbconfig/20220915-114508-root.json
* 13:34 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 11:44 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 13:34 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 11:43 moritzm: restart exim on lists1001 to pick up zlib security updates
* 13:21 gehel: repool maps2004 / depool maps2003
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34785 and previous config saved to /var/cache/conftool/dbconfig/20220915-113003-root.json
* 13:21 ema: cp: upgrade purged to 0.14
* 11:22 jayme: importing prometheus-rsyslog-exporter 0.0.0+git20201008-4 to stretch-wikimedia, buster-wikimedia, bullseye-wikimedia - [[phab:T289766|T289766]]
* 13:19 marostegui: Kill /usr/local/bin/mwscriptwikiset updateSpecialPages.php s8.dblist --override --only=Fewestrevisions [[phab:T238199|T238199]]
* 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 13:16 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1146:3312, db1146:3314 and db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P11313 and previous config saved to /var/cache/conftool/dbconfig/20200527-131515-marostegui.json
* 11:15 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1146:3312 and db1146:3314 to dbctl [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11312 and previous config saved to /var/cache/conftool/dbconfig/20200527-130820-marostegui.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34784 and previous config saved to /var/cache/conftool/dbconfig/20220915-111458-root.json
* 13:06 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:12 hnowlan: sessionstore1001: c-foreach-nt drain
* 11:34 Urbanecm: EU SWAT done
* 11:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 11:29 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/GrowthExperiments/: SWAT: {{Gerrit|983eda5}}: Mentorship dialog: Swap panel to ask-help on open ([[phab:T253692|T253692]]) (duration: 01m 06s)
* 11:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 11:18 ema: cp2027: upgrade purged to 0.14
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'pool db2129 into s6 API', diff saved to https://phabricator.wikimedia.org/P34783 and previous config saved to /var/cache/conftool/dbconfig/20220915-110453-root.json
* 11:17 ema: purged 0.14 uploaded to buster-wikimedia
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34782 and previous config saved to /var/cache/conftool/dbconfig/20220915-105953-root.json
* 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}598678{{!}}Enable ContentTranslation in Galician Wikipedia as a default tool (T250355)]] (duration: 01m 18s)
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34781 and previous config saved to /var/cache/conftool/dbconfig/20220915-104448-root.json
* 10:15 hashar: contint2001: starting zuul
* 10:36 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 10:15 hashar: contint2001: started jenkins
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34780 and previous config saved to /var/cache/conftool/dbconfig/20220915-102943-root.json
* 10:03 mutante: contint2001 - find /var/lib/jenkins -user statsite -exec chown -h jenkins:jenkins <nowiki>{</nowiki><nowiki>}</nowiki> \;
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34779 and previous config saved to /var/cache/conftool/dbconfig/20220915-101438-root.json
* 10:02 mutante: repeated rsync of /var/lib/jenkins with -p ; find /var/lib/jenkins -group bacula -user statsite -exec chown -h jenkins:jenkins <nowiki>{</nowiki><nowiki>}</nowiki> \;
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34778 and previous config saved to /var/cache/conftool/dbconfig/20220915-101425-root.json
* 09:55 hashar: contint2001: starting jenkins
* 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 09:54 hashar: contint1001 / contint2001 : deleted obsolete files /var/lib/jenkins/.git and /var/lib/jenkins/jobs/_shared/
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 09:52 mutante: contint2001 - find /var/lib/jenkins -user statsite -exec chown -h jenkins:jenkins <nowiki>{</nowiki><nowiki>}</nowiki> \;
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P34777 and previous config saved to /var/cache/conftool/dbconfig/20220915-100212-root.json
* 09:51 godog: roll restart prometheus on the fleet to apply  {{Gerrit|I0e2fe8af}}
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34775 and previous config saved to /var/cache/conftool/dbconfig/20220915-095920-root.json
* 09:49 mutante: contint2001 - find /var/lib/jenkins -group bacula -user statsite -exec chown jenkins:jenkins <nowiki>{</nowiki><nowiki>}</nowiki> \;
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 09:48 hashar: contint2001: unmasked jenkins and started it
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 09:42 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 09:42 mutante: switching CI backend from contint1001 to contint2001
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 09:40 mutante: repeated rsync -avp --delete /var/lib/zuul/ rsync://contint2001.wikimedia.org/ci--var-lib-zuul-
* 09:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:40 hashar: contint1001: masked jenkins and zuul
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34774 and previous config saved to /var/cache/conftool/dbconfig/20220915-094415-root.json
* 09:39 mutante: repeated rsync -avp --delete /var/lib/jenkins/ rsync://contint2001.wikimedia.org/ci--var-lib-jenkins-
* 09:38 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383] (duration: 14m 21s)
* 09:39 hashar: Stopping Zuul and Jenkins CI for scheduled maintenance # [[phab:T224591|T224591]]
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34773 and previous config saved to /var/cache/conftool/dbconfig/20220915-092910-root.json
* 09:35 filippo@cumin1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
* 09:23 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383]
* 08:52 hashar: contint1001: find /srv/jenkins/builds/operations-puppet-wmf-style-guide -type f -name '*.tmp' -delete  # [[phab:T253729|T253729]]
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34772 and previous config saved to /var/cache/conftool/dbconfig/20220915-091405-root.json
* 08:48 marostegui: Stop MySQL on db1103
* 09:13 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383] (duration: 00m 08s)
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 db1103:3314 to clone db1146 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11308 and previous config saved to /var/cache/conftool/dbconfig/20200527-084713-marostegui.json
* 09:13 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383]
* 08:46 arturo: removing more old packages in labstore1006 (all packages in 'rc' state)
* 09:12 aqu@deploy1002: Finished deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383] (duration: 27m 31s)
* 08:43 arturo: running apt-get autoremove on labstore1006
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34770 and previous config saved to /var/cache/conftool/dbconfig/20220915-085900-root.json
* 08:42 jynus: starting again db2097 db instances [[phab:T252492|T252492]]
* 08:49 apergos: UTC backport training window closed at lsat
* 08:11 jayme: updated admin tiller (namespace: kube-system) to 2.16.7-wmf1 in clusters: staging, codfw, eqiad
* 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 08:08 hashar: contint1001 / contint2001 : deleted unused /var/lib/zuul/git  (the real one is /srv/zuul/git )
* 08:45 aqu@deploy1002: Started deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383]
* 08:02 mutante: contint2001 - chown root:root /var/lib/zuul/git
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34768 and previous config saved to /var/cache/conftool/dbconfig/20220915-084355-root.json
* 07:54 XioNoX: test new bird conf on dns4001 - [[phab:T253666|T253666]]
* 08:43 aqu: about to deploy analytics/refinery
* 07:45 hashar: contint2001 also fixing symlink permissions: sudo find /var/lib/jenkins -not -user jenkins -exec chown -h jenkins:jenkins <nowiki>{</nowiki><nowiki>}</nowiki> +
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34767 and previous config saved to /var/cache/conftool/dbconfig/20220915-084046-root.json
* 07:35 mutante: contint2001 - find /var/lib/jenkins -group bacula -user jenkins -exec chown jenkins:jenkins <nowiki>{</nowiki><nowiki>}</nowiki> \;
* 08:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 07:30 mutante: contint2001 - find /var/lib/jenkins -user statsite -exec chown jenkins <nowiki>{</nowiki><nowiki>}</nowiki> \;
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34766 and previous config saved to /var/cache/conftool/dbconfig/20220915-082851-root.json
* 07:26 mutante: contint2001 - chown -R zuul:zuul /var/lib/zuul/
* 08:26 tsepothoabala@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832339{{!}}Enable action blocks on ptwiki (T317157)]] (duration: 04m 07s)
* 07:26 mutante: contint1001:~# rsync -avpz --delete /srv/jenkins/ rsync://contint2001.wikimedia.org/ci--srv-/jenkins/
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34765 and previous config saved to /var/cache/conftool/dbconfig/20220915-082541-root.json
* 07:25 mutante: contint1001:~# rsync -avp --delete /var/lib/jenkins/ rsync://contint2001.wikimedia.org/ci--var-lib-jenkins-
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34764 and previous config saved to /var/cache/conftool/dbconfig/20220915-082112-root.json
* 07:25 mutante: contint1001:~# rsync -avp --delete /var/lib/zuul/ rsync://contint2001.wikimedia.org/ci--var-lib-zuul-
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34763 and previous config saved to /var/cache/conftool/dbconfig/20220915-081627-root.json
* 07:18 moritzm: installing bind security updates (only client-side tools/libraries in use)
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2114 to s6 codfw master [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34762 and previous config saved to /var/cache/conftool/dbconfig/20220915-081517-marostegui.json
* 07:04 elukey: matomo upgraded to 3.13.5 on matomo1001 - [[phab:T252741|T252741]]
* 08:14 marostegui: Starting s6 codfw failover from db2129 to db2114 - [[phab:T317850|T317850]]
* 06:57 elukey: update matomo on stretch-wikimedia to 3.13.5
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34761 and previous config saved to /var/cache/conftool/dbconfig/20220915-081036-root.json
* 06:10 elukey@deploy1001: Finished deploy [analytics/superset/deploy@369a2dd]: Upgrade Superset to 0.36 - second attempt (duration: 00m 57s)
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34760 and previous config saved to /var/cache/conftool/dbconfig/20220915-080607-root.json
* 06:09 elukey@deploy1001: Started deploy [analytics/superset/deploy@369a2dd]: Upgrade Superset to 0.36 - second attempt
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 from API [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34759 and previous config saved to /var/cache/conftool/dbconfig/20220915-080157-root.json
* 05:17 marostegui: Remove tmp_3 key from enwiki.recentchanges on db1099:3311 - [[phab:T206103|T206103]]
* 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary codfw s6 [[phab:T317850|T317850]]
* 04:41 _joe_: cassandra cannot start on restbase2009, one of the disk is failed.
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2114 with weight 0 [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34758 and previous config saved to /var/cache/conftool/dbconfig/20220915-080122-root.json
* 04:39 _joe_: restarting cassandra instances on restbase2009, has a broken disk
* 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary codfw s6 [[phab:T317850|T317850]]
* 04:20 marostegui: Depool labsdb1011 - [[phab:T249188|T249188]]
* 07:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34757 and previous config saved to /var/cache/conftool/dbconfig/20220915-075531-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34756 and previous config saved to /var/cache/conftool/dbconfig/20220915-075102-root.json
* 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
* 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
* 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34755 and previous config saved to /var/cache/conftool/dbconfig/20220915-074026-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34754 and previous config saved to /var/cache/conftool/dbconfig/20220915-073557-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34753 and previous config saved to /var/cache/conftool/dbconfig/20220915-072520-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34752 and previous config saved to /var/cache/conftool/dbconfig/20220915-072053-root.json
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: reboot
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2151.codfw.wmnet with reason: reboot
* 07:14 moritzm: installing zlib security updates
* 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34751 and previous config saved to /var/cache/conftool/dbconfig/20220915-071015-root.json
* 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 07:06 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34750 and previous config saved to /var/cache/conftool/dbconfig/20220915-070548-root.json
* 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34749 and previous config saved to /var/cache/conftool/dbconfig/20220915-065510-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34748 and previous config saved to /var/cache/conftool/dbconfig/20220915-065043-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db2096 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34747 and previous config saved to /var/cache/conftool/dbconfig/20220915-064750-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2115 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34746 and previous config saved to /var/cache/conftool/dbconfig/20220915-064635-marostegui.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2096 to x1 primary and set section read-write [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34745 and previous config saved to /var/cache/conftool/dbconfig/20220915-064525-root.json
* 06:44 marostegui: Starting x1 codfw failover from db2115 to db2096 - [[phab:T317842|T317842]]
* 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T317842|T317842]]
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2096 with weight 0 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34744 and previous config saved to /var/cache/conftool/dbconfig/20220915-064014-root.json
* 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T317842|T317842]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34743 and previous config saved to /var/cache/conftool/dbconfig/20220915-063538-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34742 and previous config saved to /var/cache/conftool/dbconfig/20220915-061421-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 codfw [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34741 and previous config saved to /var/cache/conftool/dbconfig/20220915-061317-marostegui.json
* 06:12 marostegui: Starting s3 codfw failover from db2105 to db2127 - [[phab:T317839|T317839]]
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34740 and previous config saved to /var/cache/conftool/dbconfig/20220915-060307-root.json
* 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Codfw switchover s3 [[phab:T317839|T317839]]
* 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Codfw switchover s3 [[phab:T317839|T317839]]
* 05:32 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]


== 2020-05-26 ==
== 2022-09-14 ==
* 21:34 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|I0fb124b3593}} (duration: 01m 05s)
* 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34739 and previous config saved to /var/cache/conftool/dbconfig/20220914-220822-ladsgroup.json
* 21:30 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2714e2ae26404}} (duration: 01m 06s)
* 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
* 21:18 krinkle@deploy1001: Synchronized wmf-config/profiler.php: {{Gerrit|Ib0bf8d97b10b}}, [[phab:T253674|T253674]] (duration: 01m 06s)
* 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
* 20:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]]
* 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 20:08 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]] (duration: 70m 02s)
* 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 18:58 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]]
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34738 and previous config saved to /var/cache/conftool/dbconfig/20220914-220744-ladsgroup.json
* 18:07 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.30 (duration: 20m 45s)
* 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34737 and previous config saved to /var/cache/conftool/dbconfig/20220914-215238-ladsgroup.json
* 18:02 bblack: cr[12]-eqiad: re-route ns0.wikimedia.org to authdns1001 - [[phab:T241770|T241770]]
* 21:38 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 01m 48s)
* 18:02 ejegg: restarted fundraising jobs: recurring charge, audit processing, deduplication
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34736 and previous config saved to /var/cache/conftool/dbconfig/20220914-213732-ladsgroup.json
* 17:57 moritzm: installing bind security updates for stretch (only client-side tools/libraries in use)
* 21:37 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
* 17:47 cdanis: netflow3001: disabling puppet and testing some pmacct/librdkafka config tweaks [[phab:T253128|T253128]]
* 21:36 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
* 17:16 James_F: 1.35.0-wmf.34 was branched at {{Gerrit|b5012a1e7d0bbd2bf7444b8708d421992bcbe2fb}} for [[phab:T253022|T253022]]
* 21:35 dduvall@deploy1002: Installation of scap version "4.19.1" completed for 561 hosts
* 16:45 moritzm: installing jsp-api bugfix update from Buster point release
* 21:35 dduvall@deploy1002: Installing scap version "4.19.1" for 561 hosts
* 15:22 akosiaris: sync kubernetes eqiad namespaces configuration with helmfile
* 21:34 dduvall: Deploying scap 4.19.1 (https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/832297/1/changelog)
* 15:15 akosiaris: sync kubernetes codfw namespaces configuration with helmfile
* 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34735 and previous config saved to /var/cache/conftool/dbconfig/20220914-212225-ladsgroup.json
* 15:08 arturo: delete/re-import docker/containerd.io packages in the right version in buster-wikimedia/thirdparty/kubeadm-k8s-1-<nowiki>{</nowiki>15,16<nowiki>}</nowiki> ([[phab:T250866|T250866]])
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:08 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add lazy-loading to Wikimedia Foundation powered-by icon [[phab:T239377|T239377]] (duration: 00m 57s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:01 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Drop enwiki mobile mainpage special casing [[phab:T32405|T32405]] (duration: 00m 59s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:58 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:57 akosiaris: sync staging namespaces configuration
* 20:44 dancy@deploy1002: Sync cancelled.
* 14:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 20:44 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 20:44 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 20:44 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:56 jforrester@deploy1001: Synchronized docroot/noc/: Clear out symlink to mobile.php, now removed (duration: 00m 55s)
* 20:40 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:56 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:40 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:54 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 20:39 dancy@deploy1002: Started scap: testing
* 14:53 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Move mobile.php into CommonSettings.php (duration: 00m 57s)
* 20:38 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.1 refs [[phab:T314190|T314190]] (duration: 05m 49s)
* 14:44 arturo: upgrade packages in buster-wikimedia/thirdpardy/kubeadm-k8s-1-16 ([[phab:T246122|T246122]])
* 20:34 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:44 jforrester@deploy1001: Synchronized docroot/noc/: Clear out symlink to mobile-labs.php, now removed (duration: 00m 58s)
* 20:34 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:43 moritzm: installing rails security updates
* 20:34 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:41 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Don't try to load mobile-labs.php from mobile.php (duration: 00m 57s)
* 20:33 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:38 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings.php: Move uncondition/no-sideeffect includes up (duration: 00m 57s)
* 20:32 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 14:35 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean up MWMultiVersion check in CommonSettings.php (duration: 00m 59s)
* 20:28 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:33 XioNoX: test bgp med on dns4002
* 20:24 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SpecialVersionVersionUrl: Don't use confusing local variable name (duration: 00m 58s)
* 20:24 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:30 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Remove EOL REL1_32 (duration: 00m 58s)
* 20:21 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:54 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 20:19 dancy@deploy1002: deploy-promote aborted: (duration: 08m 52s)
* 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.32
* 20:19 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 01m 24s)
* 12:43 godog: swift eqiad-prod: decom ms-be101[678] - [[phab:T252008|T252008]]
* 20:18 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:21 XioNoX: repool ulsfo - [[phab:T243080|T243080]]
* 20:18 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 12:11 XioNoX: cr4-ulsfo re-activate transit/ix/4/6 - [[phab:T243080|T243080]]
* 20:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:03 XioNoX: cr4-ulsfo> request vmhost reboot - [[phab:T243080|T243080]]
* 20:13 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:01 XioNoX: cr4-ulsfo deactivate transit/ix/4/6 - [[phab:T243080|T243080]]
* 20:13 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:49 XioNoX: cr3-ulsfo> request vmhost reboot - [[phab:T243080|T243080]]
* 20:12 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:42 XioNoX: cr4-ulsfo> request vmhost software add ... - [[phab:T243080|T243080]]
* 20:09 dancy@deploy1002: Sync cancelled.
* 11:28 XioNoX: cr3-ulsfo> request vmhost software add ... - [[phab:T243080|T243080]]
* 20:09 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:27 awight: nnwiki updateCollation.php script has finished.
* 20:09 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:26 XioNoX: depool ulsfo for routers upgrade - [[phab:T243080|T243080]]
* 20:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:16 awight: EU SWAT done (pending a maintenance script to updateCollation)
* 20:02 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:14 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:598553{{!}}Add 'deletedtext' permission to researcher group (T253420)]] (duration: 01m 06s)
* 20:02 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:06 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:598509{{!}}[nnwiki] Change category collation to  (T253559)]] (duration: 01m 10s)
* 20:02 dancy@deploy1002: Started scap: testing
* 10:46 marostegui: Stop tendril's event scheduler
* 20:01 TheresNoTime: Nothing to deploy in this UTC late backport window
* 10:18 jynus: stop db2097 for hw maintenance [[phab:T252492|T252492]]
* 19:57 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
* 09:48 vgutierrez: rolling upgrade to ats 8.0.7-1wm11
* 19:57 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
* 09:41 _joe_: all jobrunners converted to use envoy for TLS termination
* 19:55 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 12s)
* 09:38 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw131[0-1].eqiad.wmnet
* 19:55 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:38 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw133[4-8].eqiad.wmnet
* 19:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:37 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw130[0-9].eqiad.wmnet
* 19:51 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:37 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw130[0-3].eqiad.wmnet
* 19:49 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:36 oblivian@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=mw129[3-9].eqiad.wmnet
* 19:49 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:31 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw130[0-3].eqiad.wmnet
* 19:49 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:27 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw130[4-7].eqiad.wmnet
* 19:48 dancy@deploy1002: Started scap: testing
* 09:22 gehel: repool wdqs1007, catched up on lag
* 19:46 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 23s)
* 09:09 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw13(0[89]{{!}}1[01]).eqiad.wmnet
* 19:46 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:39 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:02 mutante: decom'ing people1001 - replaced by people1002
* 19:39 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:01 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:39 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:01 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw13(1{{!}}3)8.eqiad.wmnet
* 19:38 dancy@deploy1002: Started scap: testing
* 08:57 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw133[4-7].eqiad.wmnet
* 19:38 dancy@deploy1002: sync-world aborted: testing (duration: 13m 25s)
* 08:55 _joe_: progressively converting jobrunners to envoy
* 19:35 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:41 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw1337.eqiad.wmnet
* 19:26 dancy: dancy@deploy1002 touch /var/lib/deploy-mwdebug/pause
* 07:20 moritzm: installing libssh security updates
* 19:24 dancy@deploy1002: Started scap: testing
* 07:03 vgutierrez: upgrade to ats 8.0.7-1wm11 on cp3064 and cp3065
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:49 marostegui: Deploy schema change on s3 directly on the master with 1 minute sleep in between wikis [[phab:T253342|T253342]]
* 19:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling (duration: 02m 03s)
* 06:47 marostegui: Deploy schema change on s1 directly on the master [[phab:T253342|T253342]]
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:44 marostegui: Deploy schema change on s4 directly on the master [[phab:T253342|T253342]]
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:35 XioNoX: reboot scs-ulsfo - [[phab:T253609|T253609]]
* 19:15 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling
* 06:29 marostegui: Deploy schema change on s7 directly on the master [[phab:T253342|T253342]]
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:24 marostegui: Deploy schema change on s8 directly on the master [[phab:T253342|T253342]]
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:01 marostegui: Deploy schema change on s2 directly on the master [[phab:T253342|T253342]]
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:35 marostegui: Repool labsdb1011 - [[phab:T249188|T249188]]
* 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:14 marostegui: Stop slaves and stop mysql on labsdb1011 [[phab:T249188|T249188]]
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:55 tstarling@deploy1001: Synchronized php-1.35.0-wmf.31/includes/export/XmlDumpWriter.php: [[phab:T253468|T253468]] (duration: 01m 06s)
* 18:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 03:53 tstarling@deploy1001: Synchronized php-1.35.0-wmf.32/includes/export/XmlDumpWriter.php: [[phab:T253468|T253468]] (duration: 01m 07s)
* 18:50 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki (duration: 02m 05s)
* 03:20 tstarling@deploy1001: Synchronized php-1.35.0-wmf.32/includes/specials/SpecialChangeContentModel.php: for UBN [[phab:T252963|T252963]] (duration: 01m 07s)
* 18:48 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki
* 03:18 tstarling@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 32s)
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:36 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 04m 41s)
* 18:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:31 dancy@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 16:47 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 16:10 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 16:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
* 16:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
* 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2024.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2024.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2002.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2002.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1026.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1026.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1029.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1029.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1028.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1028.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1027.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1027.eqiad.wmnet on all recursors
* 15:55 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2001.codfw.wmnet on all recursors
* 15:55 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2001.codfw.wmnet on all recursors
* 15:50 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 00m 39s)
* 15:49 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
* 15:48 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
* 15:24 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:22 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 15:22 cwhite@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:17 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34732 and previous config saved to /var/cache/conftool/dbconfig/20220914-145956-ladsgroup.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34731 and previous config saved to /var/cache/conftool/dbconfig/20220914-144449-ladsgroup.json
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34730 and previous config saved to /var/cache/conftool/dbconfig/20220914-142941-ladsgroup.json
* 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34729 and previous config saved to /var/cache/conftool/dbconfig/20220914-141434-ladsgroup.json
* 14:06 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 13:59 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 13:48 moritzm: imported zlib 1:1.2.8.dfsg-5+deb9u1+wmf1 to apt.wikimedia.org
* 13:40 Lucas_WMDE: UTC afternoon backport+config window done
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bnwiktionary --fix # [[phab:T317745|T317745]] – dry run result: 6043 links to fix, 6043 were resolvable, 0 were deleted
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:831970{{!}}Move namespace in the Bengali Wiktionary: উইকিসরাস → পরিশিষ্ট and set wgNamespaceAliases for newly created namespaces (T317745)]] (duration: 03m 41s)
* 13:28 topranks: upgrading routinator on rpki2002
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832145{{!}}Enable Content/Section translation on WPs with new MT support from Google (T313296)]] (duration: 03m 39s)
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:831872{{!}}Enable Section Translation in Odia Wikipedia (T313300)]] (duration: 03m 55s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:54 jayme: imported rsyslog 8.2208.0-1~bpo11+1 into bullseye-wikimedia component/rsyslog-k8s - [[phab:T289766|T289766]]
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34725 and previous config saved to /var/cache/conftool/dbconfig/20220914-115920-ladsgroup.json
* 11:49 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr2-eqdfw,cr2-eqdfw IPv6
* 11:49 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr2-eqdfw,cr2-eqdfw IPv6
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34723 and previous config saved to /var/cache/conftool/dbconfig/20220914-114413-ladsgroup.json
* 11:29 topranks: rebooting cr2-eqdfw to complete upgrade
* 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34721 and previous config saved to /var/cache/conftool/dbconfig/20220914-112907-ladsgroup.json
* 11:14 topranks: Shutting down internet transit and peering on cr2-eqdfw in advance of upgrade reboot
* 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34719 and previous config saved to /var/cache/conftool/dbconfig/20220914-111400-ladsgroup.json
* 11:02 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
* 11:02 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
* 11:01 topranks: Prepping to upgrade JunOS on cr2-eqdfw.  Adjusting OSPF costs to force traffic via alternate POPs.
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34717 and previous config saved to /var/cache/conftool/dbconfig/20220914-103810-root.json
* 10:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:24 kharlan@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:831969{{!}}BlockMetrics: Update to new event schema version (T306018)]] (duration: 03m 48s)
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34715 and previous config saved to /var/cache/conftool/dbconfig/20220914-102305-root.json
* 10:18 moritzm: import routinator 0.11.3-1bullseye  to thirdparty/routinator
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34714 and previous config saved to /var/cache/conftool/dbconfig/20220914-100800-root.json
* 10:00 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 09:59 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 09:58 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:57 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:57 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 09:53 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 09:53 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34713 and previous config saved to /var/cache/conftool/dbconfig/20220914-095255-root.json
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34712 and previous config saved to /var/cache/conftool/dbconfig/20220914-093750-root.json
* 09:27 moritzm: installing zlib/libxslt security updates on buster
* 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34711 and previous config saved to /var/cache/conftool/dbconfig/20220914-092620-ladsgroup.json
* 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34710 and previous config saved to /var/cache/conftool/dbconfig/20220914-092558-ladsgroup.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34709 and previous config saved to /var/cache/conftool/dbconfig/20220914-092245-root.json
* 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34708 and previous config saved to /var/cache/conftool/dbconfig/20220914-091052-ladsgroup.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34707 and previous config saved to /var/cache/conftool/dbconfig/20220914-090740-root.json
* 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 09:05 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
* 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34706 and previous config saved to /var/cache/conftool/dbconfig/20220914-085545-ladsgroup.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34705 and previous config saved to /var/cache/conftool/dbconfig/20220914-085235-root.json
* 08:50 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
* 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-test
* 08:49 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-test
* 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34704 and previous config saved to /var/cache/conftool/dbconfig/20220914-084039-ladsgroup.json
* 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart on A:wdqs-test
* 08:38 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart on A:wdqs-test
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
* 08:32 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:832157{{!}}Stop writing to the old templatelinks columns of enwiki (T312865)]] (duration: 06m 51s)
* 08:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:832157{{!}}Stop writing to the old templatelinks columns of enwiki (T312865)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:25 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:832157{{!}}Stop writing to the old templatelinks columns of enwiki (T312865)]]
* 08:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1024.eqiad.wmnet with reason: down
* 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es1024.eqiad.wmnet with reason: down
* 08:02 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es5 [[phab:T317739|T317739]] (duration: 03m 38s)
* 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 [[phab:T317739|T317739]]', diff saved to https://phabricator.wikimedia.org/P34703 and previous config saved to /var/cache/conftool/dbconfig/20220914-075722-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary [[phab:T317739|T317739]]', diff saved to https://phabricator.wikimedia.org/P34702 and previous config saved to /var/cache/conftool/dbconfig/20220914-075550-marostegui.json
* 07:55 marostegui: Starting es5 eqiad failover from es1024 to es1023 [[phab:T317739|T317739]]
* 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:50 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es5 [[phab:T317739|T317739]] (duration: 04m 13s)
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1023 with weight 0 [[phab:T317739|T317739]]', diff saved to https://phabricator.wikimedia.org/P34701 and previous config saved to /var/cache/conftool/dbconfig/20220914-074617-marostegui.json
* 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T317739|T317739]]
* 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T317739|T317739]]
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34700 and previous config saved to /var/cache/conftool/dbconfig/20220914-074248-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34699 and previous config saved to /var/cache/conftool/dbconfig/20220914-072743-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34698 and previous config saved to /var/cache/conftool/dbconfig/20220914-071238-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34697 and previous config saved to /var/cache/conftool/dbconfig/20220914-065733-root.json
* 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34696 and previous config saved to /var/cache/conftool/dbconfig/20220914-064330-ladsgroup.json
* 06:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34695 and previous config saved to /var/cache/conftool/dbconfig/20220914-064309-ladsgroup.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34694 and previous config saved to /var/cache/conftool/dbconfig/20220914-064228-root.json
* 06:38 elukey: restart kafka on kafka-logging2003 to pick up  the new PKI TLS settings
* 06:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
* 06:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
* 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34693 and previous config saved to /var/cache/conftool/dbconfig/20220914-062802-ladsgroup.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34692 and previous config saved to /var/cache/conftool/dbconfig/20220914-062723-root.json
* 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34691 and previous config saved to /var/cache/conftool/dbconfig/20220914-061256-ladsgroup.json
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: down
* 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: down
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 [[phab:T317735|T317735]]', diff saved to https://phabricator.wikimedia.org/P34690 and previous config saved to /var/cache/conftool/dbconfig/20220914-060913-root.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 codfw primary [[phab:T317735|T317735]]', diff saved to https://phabricator.wikimedia.org/P34689 and previous config saved to /var/cache/conftool/dbconfig/20220914-060807-marostegui.json
* 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34688 and previous config saved to /var/cache/conftool/dbconfig/20220914-055749-ladsgroup.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 [[phab:T317735|T317735]]', diff saved to https://phabricator.wikimedia.org/P34687 and previous config saved to /var/cache/conftool/dbconfig/20220914-055156-marostegui.json
* 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T317735|T317735]]
* 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T317735|T317735]]
* 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34686 and previous config saved to /var/cache/conftool/dbconfig/20220914-052510-ladsgroup.json
* 05:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 05:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34685 and previous config saved to /var/cache/conftool/dbconfig/20220914-052448-ladsgroup.json
* 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34684 and previous config saved to /var/cache/conftool/dbconfig/20220914-050942-ladsgroup.json
* 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34683 and previous config saved to /var/cache/conftool/dbconfig/20220914-045435-ladsgroup.json
* 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34682 and previous config saved to /var/cache/conftool/dbconfig/20220914-043929-ladsgroup.json
* 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34681 and previous config saved to /var/cache/conftool/dbconfig/20220914-035624-ladsgroup.json
* 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 03:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34680 and previous config saved to /var/cache/conftool/dbconfig/20220914-035546-ladsgroup.json
* 03:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34679 and previous config saved to /var/cache/conftool/dbconfig/20220914-034040-ladsgroup.json
* 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34678 and previous config saved to /var/cache/conftool/dbconfig/20220914-033921-ladsgroup.json
* 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34677 and previous config saved to /var/cache/conftool/dbconfig/20220914-032533-ladsgroup.json
* 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34676 and previous config saved to /var/cache/conftool/dbconfig/20220914-032415-ladsgroup.json
* 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34675 and previous config saved to /var/cache/conftool/dbconfig/20220914-031027-ladsgroup.json
* 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34674 and previous config saved to /var/cache/conftool/dbconfig/20220914-030908-ladsgroup.json
* 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34673 and previous config saved to /var/cache/conftool/dbconfig/20220914-025402-ladsgroup.json
* 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34672 and previous config saved to /var/cache/conftool/dbconfig/20220914-013204-ladsgroup.json
* 01:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 01:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34671 and previous config saved to /var/cache/conftool/dbconfig/20220914-013143-ladsgroup.json
* 01:24 eileen: civicrm upgraded from {{Gerrit|d91b4a2c}} to {{Gerrit|e82d9cd0}}
* 01:18 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34670 and previous config saved to /var/cache/conftool/dbconfig/20220914-011637-ladsgroup.json
* 01:14 ejegg: disabled delete_deleted_contacts job (will take effect when current job ends)
* 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34669 and previous config saved to /var/cache/conftool/dbconfig/20220914-010130-ladsgroup.json
* 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34668 and previous config saved to /var/cache/conftool/dbconfig/20220914-004624-ladsgroup.json


== 2020-05-25 ==
== 2022-09-13 ==
* 23:34 ejegg: re-enabled fundraising queue consumers and job runners, except audits, dedupe, and recurring
* 23:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34667 and previous config saved to /var/cache/conftool/dbconfig/20220913-234607-ladsgroup.json
* 21:38 eileen: civicrm revision changed from {{Gerrit|5428c5c449}} to {{Gerrit|d1cd99166f}}, config revision is {{Gerrit|6b05d6bb25}}
* 23:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 21:18 eileen: civicrm revision is {{Gerrit|7380e0e8ce}}, config revision is {{Gerrit|6b05d6bb25}}
* 23:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 21:01 ejegg: updated fundraising CiviCRM from {{Gerrit|737d88a5ee}} to {{Gerrit|7380e0e8ce}}
* 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34666 and previous config saved to /var/cache/conftool/dbconfig/20220913-234546-ladsgroup.json
* 17:17 ejegg: updated fundraising CiviCRM from {{Gerrit|6b1d5902dd}} to {{Gerrit|737d88a5ee}}
* 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34665 and previous config saved to /var/cache/conftool/dbconfig/20220913-233039-ladsgroup.json
* 17:09 ejegg: enabled contribution tracking queue on payments-wiki
* 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34664 and previous config saved to /var/cache/conftool/dbconfig/20220913-231533-ladsgroup.json
* 16:24 ejegg: updated standalone SmashPig from {{Gerrit|2702b04329}} to {{Gerrit|44690f761c}}
* 23:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 16:17 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 23:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 16:16 XioNoX: enable IX4/6 BGP group on cr4-ulsfo - [[phab:T237575|T237575]]
* 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34663 and previous config saved to /var/cache/conftool/dbconfig/20220913-231257-ladsgroup.json
* 16:00 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34662 and previous config saved to /var/cache/conftool/dbconfig/20220913-230317-ladsgroup.json
* 15:55 XioNoX: disable IX4/6 BGP group on cr4-ulsfo - [[phab:T237575|T237575]]
* 23:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 15:17 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 23:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
* 15:15 ejegg: updated payments-wiki from {{Gerrit|3c465cb11c}} to {{Gerrit|d11efeb1cf}}, put it into maintenance mode
* 23:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34661 and previous config saved to /var/cache/conftool/dbconfig/20220913-230255-ladsgroup.json
* 15:15 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34660 and previous config saved to /var/cache/conftool/dbconfig/20220913-230026-ladsgroup.json
* 14:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P34659 and previous config saved to /var/cache/conftool/dbconfig/20220913-225750-ladsgroup.json
* 14:39 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P34658 and previous config saved to /var/cache/conftool/dbconfig/20220913-224749-ladsgroup.json
* 14:06 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P34657 and previous config saved to /var/cache/conftool/dbconfig/20220913-224244-ladsgroup.json
* 14:00 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 22:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P34656 and previous config saved to /var/cache/conftool/dbconfig/20220913-223241-ladsgroup.json
* 13:46 _joe_: uploaded doxygen 1.8.17-1 to wikimedia-buster component/ci
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34655 and previous config saved to /var/cache/conftool/dbconfig/20220913-223025-ladsgroup.json
* 13:43 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift
* 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 13:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
* 13:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34654 and previous config saved to /var/cache/conftool/dbconfig/20220913-222738-ladsgroup.json
* 13:10 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 vgutierrez: upgrade ATS to version 8.0.7-1wm11 on cp4026 and cp4032
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:52 godog: roll-restart pybal in low-traffic codfw
* 22:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:44 ema: upload atskafka 0.7 to buster-wikimedia, upgrade cp3050 [[phab:T253551|T253551]]
* 22:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:37 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34653 and previous config saved to /var/cache/conftool/dbconfig/20220913-221734-ladsgroup.json
* 12:30 marostegui: Deploy schema change on s5 directly on the master [[phab:T253342|T253342]]
* 22:16 dancy: dancy@deploy1002$ rm /var/lib/deploy-mwdebug/pause
* 12:14 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 22:15 dancy@deploy1002: Sync cancelled.
* 12:09 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 22:15 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 12:01 _joe_: converting the remaining appservers to use envoy for TLS termination
* 22:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:57 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 22:14 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:54 marostegui: Install a new tendril_purge_global_status_log event on db1115 (tendril) [[phab:T252331|T252331]]
* 22:14 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 22:13 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:51 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 22:12 dancy@deploy1002: Started scap: testing
* 11:48 marostegui: Stop event scheduler on db1115 (tendril) - [[phab:T252331|T252331]]
* 22:12 dancy@deploy1002: Sync cancelled.
* 11:46 moritzm: uploaded CAS 6.1.5-1 to apt.wikimedia.org [[phab:T233947|T233947]]
* 22:11 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 11:36 _joe_: switch mw[1349-1355,1364-1373].eqiad.wmnet to envoy
* 22:11 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:27 marostegui: Extend /srv 1100G on db213[6-9] [[phab:T252985|T252985]]
* 22:11 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:23 marostegui: Extend /srv 1100G on db114[1-9] [[phab:T252512|T252512]]
* 22:10 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:21 marostegui: Extend db1141's (temporary labsdb test host) /srv 1TB extra - [[phab:T249188|T249188]]
* 22:08 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 22:07 dancy@deploy1002: Started scap: testing
* 11:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 22:07 dancy@deploy1002: Sync cancelled.
* 11:01 ema: upload prometheus-rdkafka-exporter to buster-wikimedia [[phab:T253197|T253197]]
* 22:07 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 10:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:598439{{!}} Bumping portals to master (598439)]] (duration: 01m 05s)
* 22:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:598439{{!}} Bumping portals to master (598439)]] (duration: 01m 06s)
* 22:06 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:20 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 22:05 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:56 _joe_: transition done
* 22:03 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:49 _joe_: depooled mw1337, it was getting all traffic supposed to go to the jobrunners
* 22:02 dancy@deploy1002: Started scap: testing
* 09:45 vgutierrez: upload trafficserver 8.0.7-1wm10 to apt.wm.o (buster)
* 22:01 dancy@deploy1002: Sync cancelled.
* 09:42 _joe_: converting mw1319-1333 to use envoy for TLS termination
* 22:01 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:17 _joe_: migrated mw1337 to use envoy for TLS termination [[phab:T247389|T247389]]
* 22:01 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:10 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 21:58 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:04 godog: turn on sni by default for check_http --ssl icinga invocations - [[phab:T253292|T253292]]
* 21:58 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:52 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 21:55 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:39 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 21:55 dancy@deploy1002: Started scap: testing
* 08:21 filippo@cumin1001: conftool action : set/pooled=yes:weight=100; selector: service=thanos-swift
* 21:55 dancy@deploy1002: Sync cancelled.
* 08:05 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 21:54 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 07:36 moritzm: installed linux-image-amd64 on labstore1005 (current meta package for kernels following the Stretch update) [[phab:T224582|T224582]]
* 21:54 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:36 moritzm: installed linux-imageamd64 on labstore (current meta package for kernels following the Stretch update) [[phab:T224582|T224582]]
* 21:50 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:02 marostegui: Stop event scheduler on tendril [[phab:T252331|T252331]]
* 21:50 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:11 marostegui: Deploy schema change on s6, directly on the master - [[phab:T253342|T253342]]
* 21:48 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:54 marostegui: Depool labsdb1011 - [[phab:T249188|T249188]]
* 21:47 dancy@deploy1002: Started scap: testing
* 04:11 kart_: Updated cxserver to 2020-05-22-083137-production ([[phab:T246317|T246317]], [[phab:T252871|T252871]])
* 21:37 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ MW_CONFIG_BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restric
* 04:07 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 21:36 dancy@deploy1002: Started scap: testing
* 04:04 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:02 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 21:16 dancy: dancy@deploy1002  touch /var/lib/deploy-mwdebug/pause
* 21:16 dancy@deploy1002: Sync cancelled.
* 21:15 dancy@deploy1002: dancy: testing [[phab:T299648|T299648]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:14 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:14 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:10 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:04 dancy@deploy1002: Started scap: testing [[phab:T299648|T299648]]
* 20:25 cjming: end of UTC late backport window
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:14 cjming@deploy1002: Finished scap: Backport for [[gerrit:831223{{!}}add tagline and update wordmark in ptwikinews (T313174)]] (duration: 05m 50s)
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:09 cjming@deploy1002: cjming and aishik: Backport for [[gerrit:831223{{!}}add tagline and update wordmark in ptwikinews (T313174)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:09 cjming@deploy1002: Started scap: Backport for [[gerrit:831223{{!}}add tagline and update wordmark in ptwikinews (T313174)]]
* 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34652 and previous config saved to /var/cache/conftool/dbconfig/20220913-200344-ladsgroup.json
* 20:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 20:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34651 and previous config saved to /var/cache/conftool/dbconfig/20220913-200214-ladsgroup.json
* 20:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
* 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34650 and previous config saved to /var/cache/conftool/dbconfig/20220913-200152-ladsgroup.json
* 19:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P34649 and previous config saved to /var/cache/conftool/dbconfig/20220913-194645-ladsgroup.json
* 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P34648 and previous config saved to /var/cache/conftool/dbconfig/20220913-193139-ladsgroup.json
* 19:19 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34647 and previous config saved to /var/cache/conftool/dbconfig/20220913-191632-ladsgroup.json
* 19:01 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 18:47 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 18:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34646 and previous config saved to /var/cache/conftool/dbconfig/20220913-183259-ladsgroup.json
* 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
* 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34645 and previous config saved to /var/cache/conftool/dbconfig/20220913-183238-ladsgroup.json
* 18:31 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:831941{{!}}InitialiseSettings-labs.php: Set $wgPhonosPath (T317417)]] (duration: 03m 45s)
* 18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:28 TheresNoTime: deploying a beta cluster only config change, [[phab:T317417|T317417]]
* 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P34644 and previous config saved to /var/cache/conftool/dbconfig/20220913-181731-ladsgroup.json
* 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P34643 and previous config saved to /var/cache/conftool/dbconfig/20220913-180225-ladsgroup.json
* 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34642 and previous config saved to /var/cache/conftool/dbconfig/20220913-174718-ladsgroup.json
* 17:43 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Upgrade wmf-netbox plugin - volans@cumin1001
* 17:41 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Upgrade wmf-netbox plugin - volans@cumin1001
* 17:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 17:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
* 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34640 and previous config saved to /var/cache/conftool/dbconfig/20220913-173721-ladsgroup.json
* 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34639 and previous config saved to /var/cache/conftool/dbconfig/20220913-173254-ladsgroup.json
* 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P34638 and previous config saved to /var/cache/conftool/dbconfig/20220913-172215-ladsgroup.json
* 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34637 and previous config saved to /var/cache/conftool/dbconfig/20220913-171747-ladsgroup.json
* 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P34636 and previous config saved to /var/cache/conftool/dbconfig/20220913-170708-ladsgroup.json
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P34635 and previous config saved to /var/cache/conftool/dbconfig/20220913-170241-ladsgroup.json
* 16:56 ejegg: updated fundraising CiviCRM from {{Gerrit|efbbcb57}} to {{Gerrit|d91b4a2c}}
* 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34634 and previous config saved to /var/cache/conftool/dbconfig/20220913-165202-ladsgroup.json
* 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34633 and previous config saved to /var/cache/conftool/dbconfig/20220913-165117-ladsgroup.json
* 16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
* 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34632 and previous config saved to /var/cache/conftool/dbconfig/20220913-165056-ladsgroup.json
* 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34631 and previous config saved to /var/cache/conftool/dbconfig/20220913-164734-ladsgroup.json
* 16:37 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:36 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:36 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 16:36 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P34630 and previous config saved to /var/cache/conftool/dbconfig/20220913-163549-ladsgroup.json
* 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P34629 and previous config saved to /var/cache/conftool/dbconfig/20220913-162043-ladsgroup.json
* 16:13 godog: add 200G to prometheus/eqiad instance ops
* 16:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 16:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Maintenance
* 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet with reason: Maintenance
* 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
* 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
* 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34628 and previous config saved to /var/cache/conftool/dbconfig/20220913-160536-ladsgroup.json
* 15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: down
* 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: down
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34626 and previous config saved to /var/cache/conftool/dbconfig/20220913-154810-root.json
* 15:42 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@031604d]: Automatically drop hitsorical partitions of subgraph analysis (duration: 02m 07s)
* 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 15:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34625 and previous config saved to /var/cache/conftool/dbconfig/20220913-154151-ladsgroup.json
* 15:40 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@031604d]: Automatically drop hitsorical partitions of subgraph analysis
* 15:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P34624 and previous config saved to /var/cache/conftool/dbconfig/20220913-152644-ladsgroup.json
* 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:14 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.28  refs [[phab:T314190|T314190]] (duration: 04m 31s)
* 15:13 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.0 - volans@cumin1001
* 15:12 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.0 - volans@cumin1001
* 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P34623 and previous config saved to /var/cache/conftool/dbconfig/20220913-151138-ladsgroup.json
* 15:10 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.28  refs [[phab:T314190|T314190]]
* 15:08 dancy@deploy1002: deploy-promote aborted:  (duration: 00m 02s)
* 14:59 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 04m 43s)
* 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34622 and previous config saved to /var/cache/conftool/dbconfig/20220913-145631-ladsgroup.json
* 14:54 dancy@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 14:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:47 dancy@deploy1002: deploy-promote aborted:  (duration: 01m 03s)
* 14:47 dancy@deploy1002: prep aborted:  (duration: 00m 12s)
* 14:46 moritzm: restarting FPM/Apache on mediawiki canaries
* 14:44 moritzm: installing libxslt security updates on buster
* 14:18 topranks: Core router upgrade in codfw complete - maintenance closed.
* 14:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt
* 14:12 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt
* 14:07 topranks: re-activating Transit on IX BGP on cr2-codfw
* 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34621 and previous config saved to /var/cache/conftool/dbconfig/20220913-135729-ladsgroup.json
* 13:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
* 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34620 and previous config saved to /var/cache/conftool/dbconfig/20220913-135707-ladsgroup.json
* 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P34619 and previous config saved to /var/cache/conftool/dbconfig/20220913-134201-ladsgroup.json
* 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34618 and previous config saved to /var/cache/conftool/dbconfig/20220913-133339-ladsgroup.json
* 13:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
* 13:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
* 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34617 and previous config saved to /var/cache/conftool/dbconfig/20220913-133317-ladsgroup.json
* 13:33 Lucas_WMDE: UTC afternoon backport+config window done
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P34616 and previous config saved to /var/cache/conftool/dbconfig/20220913-132654-ladsgroup.json
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:826234{{!}}testwiki: Add mediawiki.edit_attempt stream (T309013)]] (2/2) (duration: 03m 33s)
* 13:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:826234{{!}}testwiki: Add mediawiki.edit_attempt stream (T309013)]] (1/2) (duration: 03m 39s)
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 Emperor: set thanos ring replicas to 3.85 [[phab:T311690|T311690]]
* 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P34615 and previous config saved to /var/cache/conftool/dbconfig/20220913-131811-ladsgroup.json
* 13:14 topranks: Flipping back to RE0 on cr2-codfw (last disruptive switch)
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:824685{{!}}Remove $wgWMESearchRelevancePages]] (unused) (duration: 03m 53s)
* 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34614 and previous config saved to /var/cache/conftool/dbconfig/20220913-131148-ladsgroup.json
* 13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
* 13:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
* 13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
* 13:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
* 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
* 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet with reason: Maintenance
* 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
* 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
* 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P34613 and previous config saved to /var/cache/conftool/dbconfig/20220913-130304-ladsgroup.json
* 12:59 topranks: Switching active RE back to RE1 on cr1-codfw as firmware hadn't been loaded while it was master
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34612 and previous config saved to /var/cache/conftool/dbconfig/20220913-125745-ladsgroup.json
* 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34611 and previous config saved to /var/cache/conftool/dbconfig/20220913-125723-ladsgroup.json
* 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34610 and previous config saved to /var/cache/conftool/dbconfig/20220913-124758-ladsgroup.json
* 12:46 topranks: forcing non-graceful RE switchover on cr2-codfw as part of upgrade
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P34609 and previous config saved to /var/cache/conftool/dbconfig/20220913-124217-ladsgroup.json
* 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P34608 and previous config saved to /var/cache/conftool/dbconfig/20220913-122710-ladsgroup.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34607 and previous config saved to /var/cache/conftool/dbconfig/20220913-122415-root.json
* 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34606 and previous config saved to /var/cache/conftool/dbconfig/20220913-121204-ladsgroup.json
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34605 and previous config saved to /var/cache/conftool/dbconfig/20220913-120910-root.json
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34604 and previous config saved to /var/cache/conftool/dbconfig/20220913-120653-ladsgroup.json
* 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
* 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34603 and previous config saved to /var/cache/conftool/dbconfig/20220913-120632-ladsgroup.json
* 11:58 topranks: Disabling transit and ixp BGP on cr2-codfw in advance of software upgrade
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34602 and previous config saved to /var/cache/conftool/dbconfig/20220913-115405-root.json
* 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P34601 and previous config saved to /var/cache/conftool/dbconfig/20220913-115125-ladsgroup.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34600 and previous config saved to /var/cache/conftool/dbconfig/20220913-113900-root.json
* 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P34599 and previous config saved to /var/cache/conftool/dbconfig/20220913-113619-ladsgroup.json
* 11:34 hashar: Upgrading CI Jenkins [[phab:T317418|T317418]]
* 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34598 and previous config saved to /var/cache/conftool/dbconfig/20220913-112818-ladsgroup.json
* 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34597 and previous config saved to /var/cache/conftool/dbconfig/20220913-112355-root.json
* 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34596 and previous config saved to /var/cache/conftool/dbconfig/20220913-112112-ladsgroup.json
* 11:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: router upgrade
* 11:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: router upgrade
* 11:15 topranks: completed cr1-codfw upgrade, will proceed to cr2-codfw shortly
* 11:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt
* 11:14 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt
* 11:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:09 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.1/includes/libs/rdbms/ChronologyProtector.php: Backport: [[gerrit:831847{{!}}rdbms: Bump ChronologyProtector cache key version (T317606)]] (duration: 03m 49s)
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34595 and previous config saved to /var/cache/conftool/dbconfig/20220913-110850-root.json
* 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34594 and previous config saved to /var/cache/conftool/dbconfig/20220913-110755-ladsgroup.json
* 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34593 and previous config saved to /var/cache/conftool/dbconfig/20220913-110715-root.json
* 11:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T317627|T317627]]', diff saved to https://phabricator.wikimedia.org/P34592 and previous config saved to /var/cache/conftool/dbconfig/20220913-105733-root.json
* 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2107 to s2 codfw primary [[phab:T317627|T317627]]', diff saved to https://phabricator.wikimedia.org/P34591 and previous config saved to /var/cache/conftool/dbconfig/20220913-105642-marostegui.json
* 10:56 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
* 10:56 marostegui: Starting s2 codfw failover from db2104 to db2107 - [[phab:T317627|T317627]]
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34590 and previous config saved to /var/cache/conftool/dbconfig/20220913-105210-root.json
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34589 and previous config saved to /var/cache/conftool/dbconfig/20220913-103705-root.json
* 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2107 from api [[phab:T317627|T317627]]', diff saved to https://phabricator.wikimedia.org/P34588 and previous config saved to /var/cache/conftool/dbconfig/20220913-103658-marostegui.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2107 with weight 0 [[phab:T317627|T317627]]', diff saved to https://phabricator.wikimedia.org/P34587 and previous config saved to /var/cache/conftool/dbconfig/20220913-103621-marostegui.json
* 10:35 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
* 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 [[phab:T317627|T317627]]
* 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 [[phab:T317627|T317627]]
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34586 and previous config saved to /var/cache/conftool/dbconfig/20220913-102232-ladsgroup.json
* 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34585 and previous config saved to /var/cache/conftool/dbconfig/20220913-102147-root.json
* 10:16 topranks: Flipping master RE on cr1-codfw to backup as part of upgrade
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34584 and previous config saved to /var/cache/conftool/dbconfig/20220913-100642-root.json
* 10:04 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
* 09:52 elukey: restart kafka on kafka-logging2002 to move it to PKI-based TLS certs
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34583 and previous config saved to /var/cache/conftool/dbconfig/20220913-095137-root.json
* 09:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2002.codfw.wmnet with reason: Kafka PKI upgrade
* 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2002.codfw.wmnet with reason: Kafka PKI upgrade
* 09:45 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
* 09:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
* 09:41 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 09:37 hashar: Restarting CI Jenkins on contint2001 (with new systemd service)
* 09:33 hashar: Enabling Puppet on contint2001 for Jenkins systemd change
* 09:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
* 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34582 and previous config saved to /var/cache/conftool/dbconfig/20220913-092904-ladsgroup.json
* 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
* 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34581 and previous config saved to /var/cache/conftool/dbconfig/20220913-092826-ladsgroup.json
* 09:25 hashar: Stopped Puppet on contint2001 for a Jenkins systemd change
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 [[phab:T317614|T317614]]', diff saved to https://phabricator.wikimedia.org/P34580 and previous config saved to /var/cache/conftool/dbconfig/20220913-092200-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary [[phab:T317614|T317614]]', diff saved to https://phabricator.wikimedia.org/P34579 and previous config saved to /var/cache/conftool/dbconfig/20220913-092032-root.json
* 09:19 marostegui: Starting s1 codfw failover from db2103 to db2112 - [[phab:T317614|T317614]]
* 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P34578 and previous config saved to /var/cache/conftool/dbconfig/20220913-091320-ladsgroup.json
* 09:11 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:11 volans@cumin1001: START - Cookbook sre.network.cf
* 09:02 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt with reason: router upgrade
* 09:02 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-codfw,cr1-codfw IPv6,re0.cr1-codfw.mgmt with reason: router upgrade
* 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P34577 and previous config saved to /var/cache/conftool/dbconfig/20220913-085814-ladsgroup.json
* 08:56 topranks: Flipping primary routing engine to RE1 on cr1-codfw (disruptive) as part of upgrade.
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 [[phab:T317614|T317614]]', diff saved to https://phabricator.wikimedia.org/P34576 and previous config saved to /var/cache/conftool/dbconfig/20220913-085456-marostegui.json
* 08:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s1 [[phab:T317614|T317614]]
* 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: Primary switchover s1 [[phab:T317614|T317614]]
* 08:46 topranks: Disabled LVS/PyBal peerings on cr1-codfw ain advance of upgrade to router.
* 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
* 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34575 and previous config saved to /var/cache/conftool/dbconfig/20220913-084307-ladsgroup.json
* 08:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
* 08:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
* 08:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
* 08:27 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 08:27 cmooney@cumin1001: START - Cookbook sre.network.cf
* 08:17 moritzm: roll-restarting apache/FPM on mw canaries to pick up zlib security updates
* 08:15 topranks: de-pooling codfw ahead of core router upgrades at the site
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:18 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.28  refs [[phab:T314190|T314190]] (duration: 04m 29s)
* 07:14 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.28  refs [[phab:T314190|T314190]]
* 07:11 jhuneidi@deploy1002: deploy-promote aborted:  (duration: 00m 09s)
* 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34574 and previous config saved to /var/cache/conftool/dbconfig/20220913-065457-ladsgroup.json
* 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P34573 and previous config saved to /var/cache/conftool/dbconfig/20220913-063951-ladsgroup.json
* 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34572 and previous config saved to /var/cache/conftool/dbconfig/20220913-063908-ladsgroup.json
* 06:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
* 06:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 06:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P34571 and previous config saved to /var/cache/conftool/dbconfig/20220913-062444-ladsgroup.json
* 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34570 and previous config saved to /var/cache/conftool/dbconfig/20220913-060938-ladsgroup.json
* 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34569 and previous config saved to /var/cache/conftool/dbconfig/20220913-045832-ladsgroup.json
* 04:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 04:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
* 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34568 and previous config saved to /var/cache/conftool/dbconfig/20220913-045811-ladsgroup.json
* 04:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P34567 and previous config saved to /var/cache/conftool/dbconfig/20220913-044304-ladsgroup.json
* 04:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P34566 and previous config saved to /var/cache/conftool/dbconfig/20220913-042758-ladsgroup.json
* 04:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34565 and previous config saved to /var/cache/conftool/dbconfig/20220913-041251-ladsgroup.json
* 04:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.27 (duration: 01m 59s)
* 03:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 35m 37s)
* 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34564 and previous config saved to /var/cache/conftool/dbconfig/20220913-022136-ladsgroup.json
* 02:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 02:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 02:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34563 and previous config saved to /var/cache/conftool/dbconfig/20220913-022114-ladsgroup.json
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P34562 and previous config saved to /var/cache/conftool/dbconfig/20220913-020608-ladsgroup.json
* 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P34561 and previous config saved to /var/cache/conftool/dbconfig/20220913-015102-ladsgroup.json
* 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34560 and previous config saved to /var/cache/conftool/dbconfig/20220913-013555-ladsgroup.json
* 00:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: syntax error in sudo
* 00:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: syntax error in sudo
* 00:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet with reason: syntax error in sudo
* 00:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2002.codfw.wmnet with reason: syntax error in sudo
* 00:48 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: syntax error in sudo
* 00:48 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: syntax error in sudo
* 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34559 and previous config saved to /var/cache/conftool/dbconfig/20220913-001908-ladsgroup.json
* 00:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
* 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34558 and previous config saved to /var/cache/conftool/dbconfig/20220913-001846-ladsgroup.json
* 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P34557 and previous config saved to /var/cache/conftool/dbconfig/20220913-000340-ladsgroup.json


== 2020-05-24 ==
== 2022-09-12 ==
* 17:36 gehel: restarting elasticsearch psi on elastic1052
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P34556 and previous config saved to /var/cache/conftool/dbconfig/20220912-234833-ladsgroup.json
* 16:44 gehel: depool wdqs1007 to catch on lag
* 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34555 and previous config saved to /var/cache/conftool/dbconfig/20220912-233327-ladsgroup.json
* 16:43 gehel: restart blazegraph on wdqs1007
* 22:53 mutante: phabricator - disabling MediaWiki extension repositories in Diffusion that have 0 commits - [[phab:T296022|T296022]] - [[phab:T315706|T315706]]
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34554 and previous config saved to /var/cache/conftool/dbconfig/20220912-224006-ladsgroup.json
* 22:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34553 and previous config saved to /var/cache/conftool/dbconfig/20220912-223927-ladsgroup.json
* 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34552 and previous config saved to /var/cache/conftool/dbconfig/20220912-222420-ladsgroup.json
* 22:23 mutante: phabricator - disabling repositories: tool-xh-bot, tool-editor-contribution-dashboard, tool-ranker, tool-editor-contribution, tool-mikasa-bot-1, tool-maintun, tool-add-text, tool-wikibookassamese-book.php (none of them had commits) [[phab:T296022|T296022]] - [[phab:T315706|T315706]]
* 22:20 mutante: phabricator - disabling repository "tool-ranker"
* 22:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34551 and previous config saved to /var/cache/conftool/dbconfig/20220912-220914-ladsgroup.json
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34550 and previous config saved to /var/cache/conftool/dbconfig/20220912-215407-ladsgroup.json
* 21:07 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 19s)
* 21:07 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34549 and previous config saved to /var/cache/conftool/dbconfig/20220912-210123-ladsgroup.json
* 20:57 TheresNoTime: closing UTC late backport window
* 20:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:831549{{!}}Set track_total_hits to true]] (duration: 05m 00s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:831549{{!}}Set track_total_hits to true]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:51 samtar@deploy1002: Started scap: Backport for [[gerrit:831549{{!}}Set track_total_hits to true]]
* 20:49 samtar@deploy1002: Finished scap: Backport for [[gerrit:831117{{!}}Enable Nearby on Hebrew and French Wikipedia (T246493)]] (duration: 07m 27s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34548 and previous config saved to /var/cache/conftool/dbconfig/20220912-204617-ladsgroup.json
* 20:42 samtar@deploy1002: samtar and jdlrobson: Backport for [[gerrit:831117{{!}}Enable Nearby on Hebrew and French Wikipedia (T246493)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:42 samtar@deploy1002: Started scap: Backport for [[gerrit:831117{{!}}Enable Nearby on Hebrew and French Wikipedia (T246493)]]
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:40 samtar@deploy1002: Finished scap: Backport for [[gerrit:830917{{!}}Deploy Research Incentive Survey to idwiki (T316466)]] (duration: 06m 25s)
* 20:39 jhathaway: testing exim config change on mx1001.wikimedia.org
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:38 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dispatch-be1001.eqiad.wmnet
* 20:34 samtar@deploy1002: samtar and dani: Backport for [[gerrit:830917{{!}}Deploy Research Incentive Survey to idwiki (T316466)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:34 samtar@deploy1002: Started scap: Backport for [[gerrit:830917{{!}}Deploy Research Incentive Survey to idwiki (T316466)]]
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 samtar@deploy1002: Finished scap: Backport for [[gerrit:831548{{!}}Re-enable track_total_hits for elastic7 (T317374)]] (duration: 06m 12s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34547 and previous config saved to /var/cache/conftool/dbconfig/20220912-203110-ladsgroup.json
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:26 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:831548{{!}}Re-enable track_total_hits for elastic7 (T317374)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:26 samtar@deploy1002: Started scap: Backport for [[gerrit:831548{{!}}Re-enable track_total_hits for elastic7 (T317374)]]
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:830982{{!}}Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424)]] (duration: 08m 14s)
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34546 and previous config saved to /var/cache/conftool/dbconfig/20220912-202359-ladsgroup.json
* 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:16 samtar@deploy1002: samtar and aishik: Backport for [[gerrit:830982{{!}}Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:16 samtar@deploy1002: Started scap: Backport for [[gerrit:830982{{!}}Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424)]]
* 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34545 and previous config saved to /var/cache/conftool/dbconfig/20220912-201604-ladsgroup.json
* 20:15 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dispatch-be1001.eqiad.wmnet on all recursors
* 20:14 herron@cumin1001: START - Cookbook sre.dns.wipe-cache dispatch-be1001.eqiad.wmnet on all recursors
* 20:14 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:14 samtar@deploy1002: Finished scap: Backport for [[gerrit:831167{{!}}Mark spcomwiki and searchcomwiki as closed (T285685)]] (duration: 05m 40s)
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:12 herron@cumin1001: START - Cookbook sre.dns.netbox
* 20:12 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host dispatch-be1001.eqiad.wmnet
* 20:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:09 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:831167{{!}}Mark spcomwiki and searchcomwiki as closed (T285685)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:08 samtar@deploy1002: Started scap: Backport for [[gerrit:831167{{!}}Mark spcomwiki and searchcomwiki as closed (T285685)]]
* 20:07 samtar@deploy1002: backport aborted:  (duration: 03m 46s)
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts theemin.codfw.wmnet
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:59 maryum: deployed security patch for [[phab:T314245|T314245]]
* 19:59 pt1979@cumin2002: START - Cookbook sre.hosts.decommission for hosts theemin.codfw.wmnet
* 19:58 mstyles@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/PageTriage/includes/Api/ApiPageTriageAction.php: (no justification provided) (duration: 03m 42s)
* 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34544 and previous config saved to /var/cache/conftool/dbconfig/20220912-195540-ladsgroup.json
* 19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
* 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34543 and previous config saved to /var/cache/conftool/dbconfig/20220912-195519-ladsgroup.json
* 19:53 sbassett: Deployed security patch for [[phab:T311337|T311337]]
* 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P34542 and previous config saved to /var/cache/conftool/dbconfig/20220912-194013-ladsgroup.json
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34541 and previous config saved to /var/cache/conftool/dbconfig/20220912-192858-ladsgroup.json
* 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34540 and previous config saved to /var/cache/conftool/dbconfig/20220912-192837-ladsgroup.json
* 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@e012d14]: 0.3.116 (duration: 02m 04s)
* 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@e012d14]: 0.3.116
* 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P34539 and previous config saved to /var/cache/conftool/dbconfig/20220912-192506-ladsgroup.json
* 19:20 dancy@deploy1002: Installation of scap version "4.19.0" completed for 561 hosts
* 19:20 dancy@deploy1002: Installing scap version "4.19.0" for 561 hosts
* 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34538 and previous config saved to /var/cache/conftool/dbconfig/20220912-191330-ladsgroup.json
* 19:12 dancy@deploy1002: Installation of scap version "4.18.0" completed for 561 hosts
* 19:12 dancy@deploy1002: Installing scap version "4.18.0" for 561 hosts
* 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34537 and previous config saved to /var/cache/conftool/dbconfig/20220912-191000-ladsgroup.json
* 19:08 inflatador: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 19:04 ryankemper: [WCQS] Depooled `wcqs100[1,2]` while they catch up on ~1.5 days worth of lag (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=w