You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31256 and previous config saved to /var/cache/conftool/dbconfig/20220717-180539-ladsgroup.json)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply)
 
(78 intermediate revisions by the same user not shown)
Line 1: Line 1:
== 2022-07-17 ==
== 2022-10-06 ==
* 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31256 and previous config saved to /var/cache/conftool/dbconfig/20220717-180539-ladsgroup.json
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31255 and previous config saved to /var/cache/conftool/dbconfig/20220717-175034-ladsgroup.json
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31254 and previous config saved to /var/cache/conftool/dbconfig/20220717-173528-ladsgroup.json
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31253 and previous config saved to /var/cache/conftool/dbconfig/20220717-172023-ladsgroup.json
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31252 and previous config saved to /var/cache/conftool/dbconfig/20220717-155102-ladsgroup.json
* 21:08 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:839577{{!}}Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)]] (duration: 06m 08s)
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 21:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudnet1004.eqiad.wmnet
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 21:02 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for [[gerrit:839577{{!}}Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance
* 21:01 thcipriani@deploy1002: Started scap: Backport for [[gerrit:839577{{!}}Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)]]
* 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31251 and previous config saved to /var/cache/conftool/dbconfig/20220717-155025-ladsgroup.json
* 20:58 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31250 and previous config saved to /var/cache/conftool/dbconfig/20220717-153520-ladsgroup.json
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31249 and previous config saved to /var/cache/conftool/dbconfig/20220717-152015-ladsgroup.json
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31248 and previous config saved to /var/cache/conftool/dbconfig/20220717-150510-ladsgroup.json
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31247 and previous config saved to /var/cache/conftool/dbconfig/20220717-132751-ladsgroup.json
* 20:45 samtar@deploy1002: Finished scap: Backport for [[gerrit:839575{{!}}Replace promise handling when AfD'ing pages (T238025)]], [[gerrit:839576{{!}}Replace promise handling when AfD'ing pages (T238025)]] (duration: 07m 56s)
* 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance
* 20:39 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudnet1004.eqiad.wmnet
* 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31246 and previous config saved to /var/cache/conftool/dbconfig/20220717-132731-ladsgroup.json
* 20:37 samtar@deploy1002: samtar and samtar: Backport for [[gerrit:839575{{!}}Replace promise handling when AfD'ing pages (T238025)]], [[gerrit:839576{{!}}Replace promise handling when AfD'ing pages (T238025)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31245 and previous config saved to /var/cache/conftool/dbconfig/20220717-131226-ladsgroup.json
* 20:37 samtar@deploy1002: Started scap: Backport for [[gerrit:839575{{!}}Replace promise handling when AfD'ing pages (T238025)]], [[gerrit:839576{{!}}Replace promise handling when AfD'ing pages (T238025)]]
* 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31244 and previous config saved to /var/cache/conftool/dbconfig/20220717-125720-ladsgroup.json
* 20:36 samtar@deploy1002: Backport cancelled.
* 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31243 and previous config saved to /var/cache/conftool/dbconfig/20220717-124215-ladsgroup.json
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31242 and previous config saved to /var/cache/conftool/dbconfig/20220717-110523-ladsgroup.json
* 20:34 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:839572{{!}}Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)]] (duration: 09m 51s)
* 11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 20:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudnet1003.eqiad.wmnet
* 11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance
* 20:33 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31241 and previous config saved to /var/cache/conftool/dbconfig/20220717-110503-ladsgroup.json
* 20:32 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31240 and previous config saved to /var/cache/conftool/dbconfig/20220717-104958-ladsgroup.json
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31239 and previous config saved to /var/cache/conftool/dbconfig/20220717-103453-ladsgroup.json
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31238 and previous config saved to /var/cache/conftool/dbconfig/20220717-101948-ladsgroup.json
* 20:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudnet1003.eqiad.wmnet
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31237 and previous config saved to /var/cache/conftool/dbconfig/20220717-084432-ladsgroup.json
* 20:25 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for [[gerrit:839572{{!}}Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 20:24 thcipriani@deploy1002: Started scap: Backport for [[gerrit:839572{{!}}Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)]]
* 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31236 and previous config saved to /var/cache/conftool/dbconfig/20220717-084411-ladsgroup.json
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31235 and previous config saved to /var/cache/conftool/dbconfig/20220717-082906-ladsgroup.json
* 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31234 and previous config saved to /var/cache/conftool/dbconfig/20220717-081401-ladsgroup.json
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31233 and previous config saved to /var/cache/conftool/dbconfig/20220717-075856-ladsgroup.json
* 20:05 samtar@deploy1002: backport aborted: (duration: 03m 13s)
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31232 and previous config saved to /var/cache/conftool/dbconfig/20220717-071149-ladsgroup.json
* 20:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 19:51 SandraEbele: Started airflow projectview_hourly_dag
* 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance
* 19:50 SandraEbele: killed Oozie projectview-hourly job
* 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31231 and previous config saved to /var/cache/conftool/dbconfig/20220717-071129-ladsgroup.json
* 19:41 SandraEbele: deployed airflow to fix projectview_hourly_dag
* 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31230 and previous config saved to /var/cache/conftool/dbconfig/20220717-065624-ladsgroup.json
* 19:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cbdc509]: (no justification provided) (duration: 00m 14s)
* 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31229 and previous config saved to /var/cache/conftool/dbconfig/20220717-064119-ladsgroup.json
* 19:34 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cbdc509]: (no justification provided)
* 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31228 and previous config saved to /var/cache/conftool/dbconfig/20220717-062614-ladsgroup.json
* 19:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31227 and previous config saved to /var/cache/conftool/dbconfig/20220717-044802-ladsgroup.json
* 19:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 19:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance
* 19:29 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 04:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance
* 19:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 19:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.3  refs [[phab:T314193|T314193]]
* 04:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 19:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31226 and previous config saved to /var/cache/conftool/dbconfig/20220717-010309-ladsgroup.json
* 19:15 brennen: train 1.40.0-wmf.4 ([[phab:T314193|T314193]]) no current blockers, rolling train to all wikis
* 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31225 and previous config saved to /var/cache/conftool/dbconfig/20220717-004804-ladsgroup.json
* 19:03 inflatador: 'bking@elastic restarted elastic2025, 2031, 2061, 2084 [[phab:T313431|T313431]]
* 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31224 and previous config saved to /var/cache/conftool/dbconfig/20220717-003259-ladsgroup.json
* 18:52 gehel@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on elastic[2025,2031].codfw.wmnet with reason: restarting for config reload - [[phab:T313431|T313431]]
* 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31223 and previous config saved to /var/cache/conftool/dbconfig/20220717-001754-ladsgroup.json
* 18:52 gehel@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on elastic[2025,2031].codfw.wmnet with reason: restarting for config reload - [[phab:T313431|T313431]]
* 00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31222 and previous config saved to /var/cache/conftool/dbconfig/20220717-000143-ladsgroup.json
* 18:51 gehel@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on elastic2084.codfw.wmnet with reason: restarting for config reload - [[phab:T313431|T313431]]
* 00:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 18:50 gehel@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on elastic2084.codfw.wmnet with reason: restarting for config reload - [[phab:T313431|T313431]]
* 00:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 18:50 gehel@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on elastic2061.codfw.wmnet with reason: restarting for config reload - [[phab:T313431|T313431]]
* 18:50 gehel@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on elastic2061.codfw.wmnet with reason: restarting for config reload - [[phab:T313431|T313431]]
* 18:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudnet1003.eqiad.wmnet
* 18:39 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:35 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 18:29 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudnet1003.eqiad.wmnet
* 16:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
* 16:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
* 15:57 topranks: Applying explicit BFD mode configuration to cr4-ulsfo for Anycast BGP groups.
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:48 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1004.eqiad.wmnet with OS bullseye
* 15:47 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
* 15:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 15:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1009.eqiad.wmnet
* 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:17 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 jynus: reload haproxy config on dbproxy1016, dbproxy1017
* 15:11 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1009.eqiad.wmnet
* 15:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1008.eqiad.wmnet
* 15:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:08 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:08 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:08 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 15:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 15:01 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1008.eqiad.wmnet
* 15:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 14:56 bblack: eqiad front edge depooled in DNS
* 14:49 XioNoX: move asw2-d-eqiad<->cr1 link to new 40G link - [[phab:T313385|T313385]]
* 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 14:43 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet1005.eqiad.wmnet on all recursors
* 14:43 cmooney@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet1005.eqiad.wmnet on all recursors
* 14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:40 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) failoid2001.codfw.wmnet on codfw recursors
* 14:40 volans@cumin1001: START - Cookbook sre.dns.wipe-cache failoid2001.codfw.wmnet on codfw recursors
* 14:40 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 14:30 XioNoX: moving eqiad row C vrrp mastership to cr1-eqiad
* 14:28 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 14:16 hashar: Gerrit upgraded from 3.4.5 to 3.4.6 # [[phab:T319513|T319513]]
* 14:13 XioNoX: move asw2-c-eqiad<->cr1 link to new 40G link - [[phab:T313385|T313385]]
* 14:12 hashar@deploy1002: Finished deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit1001 (duration: 00m 08s)
* 14:12 hashar@deploy1002: Started deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit1001
* 14:12 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 14:12 hashar: Upgrading primary Gerrit # [[phab:T319513|T319513]]
* 14:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 14:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit2002 (duration: 00m 10s)
* 14:08 hashar@deploy1002: Started deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit2002
* 14:07 vgutierrez: updating HAProxy to version 2.4.19 in ulsfo
* 14:03 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts aqs1007.eqiad.wmnet
* 14:03 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:01 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 13:48 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1007.eqiad.wmnet
* 13:41 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
* 13:20 urbanecm: UTC afternoon backport window done
* 13:20 moritzm: draining ganeti1014 [[phab:T311687|T311687]]
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:839500{{!}}Show thumbnails on Special:Search for NS_FILE + PageImages (T306883)]] (duration: 05m 12s)
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:17 vgutierrez: partition ats-be cache in cp6008 - [[phab:T317748|T317748]]
* 13:16 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1006.eqiad.wmnet
* 13:16 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:15 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:14 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:14 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:13 urbanecm@deploy1002: urbanecm and mlitn: Backport for [[gerrit:839500{{!}}Show thumbnails on Special:Search for NS_FILE + PageImages (T306883)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 13:13 urbanecm@deploy1002: Started scap: Backport for [[gerrit:839500{{!}}Show thumbnails on Special:Search for NS_FILE + PageImages (T306883)]]
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:12 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:826882{{!}}Explicit config for Wikistories discovery module (T314582)]] (duration: 06m 37s)
* 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 13:06 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:06 urbanecm@deploy1002: urbanecm and sbisson: Backport for [[gerrit:826882{{!}}Explicit config for Wikistories discovery module (T314582)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:06 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:05 urbanecm@deploy1002: Started scap: Backport for [[gerrit:826882{{!}}Explicit config for Wikistories discovery module (T314582)]]
* 12:59 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1026.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
* 12:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1026.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
* 12:54 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1006.eqiad.wmnet
* 12:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1029.eqiad.wmnet
* 12:43 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:42 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:40 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
* 12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:36 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 12:34 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
* 12:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1005.eqiad.wmnet
* 12:24 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:21 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 12:15 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1005.eqiad.wmnet
* 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1012.eqiad.wmnet to cluster eqiad and group C
* 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1004.eqiad.wmnet
* 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:28 jbond: enable puppet post deploy  puppetdb change 814824
* 11:27 jbond: switch puppetdb replication to use replications slots
* 11:27 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 11:27 btullis: cold-reset the BMC on analytics1076
* 11:22 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1004.eqiad.wmnet
* 10:58 jbond: disable puppet temporarily to deploy a puppetdb change 814824
* 10:51 _joe_: installing the upgraded php package everywhere, [[phab:T318918|T318918]]
* 10:30 elukey: restart kafka on kafka-logging1003 to reload the conifg (cleanup old super.users related to past keystore)
* 10:16 moritzm: installing ruby-rack security updates
* 10:11 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining wikis
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging NOkafor out of all services on: 1213 hosts
* 10:07 jmm@cumin2002: START - Cookbook sre.idm.logout Logging NOkafor out of all services on: 1213 hosts
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging NOkafor out of all services on: 799 hosts
* 10:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging NOkafor out of all services on: 799 hosts
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jmads out of all services on: 799 hosts
* 10:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jmads out of all services on: 799 hosts
* 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:02 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for all wikis (duration: 03m 39s)
* 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jmads out of all services on: 1213 hosts
* 10:00 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jmads out of all services on: 1213 hosts
* 09:57 moritzm: installing glib2.0 security updates on buster
* 09:52 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for itwiki, arzwiki, ptwiki
* 09:41 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1005.eqiad.wmnet
* 09:34 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 09:32 moritzm: installing python-oslo.utils security updates
* 09:28 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for viwiki, metawiki, frwiktionary
* 09:22 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for nlwiktionary, ruwiki, jawiki
* 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:21 _joe_: installed the upgraded php package to mw1414, [[phab:T318918|T318918]]
* 09:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:18 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for nine wikis (duration: 03m 41s)
* 09:05 topranks: re-pooling esams after cr2-esams line card reboot
* 09:04 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for cebwiki
* 09:04 hoo: Ran extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for specieswiki
* 09:04 hoo: Ran extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for ruwiktionary
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:59 _joe_: uploaded new php 7.4 packages [[phab:T318918|T318918]]
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:54 topranks: rebooting line card fpc 0 on cr2-esams ([[phab:T318783|T318783]])
* 08:53 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for three wikis (duration: 04m 03s)
* 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:48 moritzm: installing jetty9 security updates
* 08:42 moritzm: installing rails security updates
* 08:37 moritzm: installing puma security updates
* 08:27 topranks: disabling OSPF on cr2-esams
* 08:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-esams,cr2-esams IPv6,re0.cr2-esams.mgmt with reason: line card reboot
* 08:24 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-esams,cr2-esams IPv6,re0.cr2-esams.mgmt with reason: line card reboot
* 08:21 topranks: disabling external BGP sessions on cr2-esams prior to line card reboot
* 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
* 08:10 elukey: restart kafka on kafka-logging1002 to reload the conifg (cleanup old super.users related to past keystore)
* 08:10 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
* 08:09 elukey: kafka logging old cert cleanup - `cumin 'A:kafka-logging' 'rm -f /etc/kafka/ssl/kafka_logging-eqiad_broker.keystore.jks'`
* 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1012.eqiad.wmnet to cluster eqiad and group C
* 08:00 elukey: delete /etc/kafka/ssl/kafka_logging-eqiad_broker.keystore.jks on kafka-logging1001 and restart (old puppet cert + settings deleted)
* 07:50 topranks: De-pooling esams in advance of cr2-esams line card reboot
* 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
* 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
* 07:36 moritzm: draining ganeti1026 [[phab:T311687|T311687]]
* 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1012.eqiad.wmnet with OS bullseye
* 07:15 moritzm: draining ganeti1005 [[phab:T311687|T311687]]
* 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1012.eqiad.wmnet with reason: host reimage
* 07:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1012.eqiad.wmnet with reason: host reimage
* 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1012.eqiad.wmnet with OS bullseye
* 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6079
* 06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6079
* 06:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 22616
* 06:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 22616
* 01:12 reedy@deploy1002: Finished deploy [integration/docroot@dc380cb]: Update jQuery (duration: 00m 11s)
* 01:12 reedy@deploy1002: Started deploy [integration/docroot@dc380cb]: Update jQuery
* 01:03 reedy@deploy1002: Finished deploy [integration/docroot@5cd2243]: Minor fixes (duration: 00m 12s)
* 01:03 reedy@deploy1002: Started deploy [integration/docroot@5cd2243]: Minor fixes
* 00:35 reedy@deploy1002: Finished deploy [integration/docroot@13687ed]: More minor updates (duration: 00m 30s)
* 00:35 reedy@deploy1002: Started deploy [integration/docroot@13687ed]: More minor updates


== 2022-07-16 ==
== 2022-10-05 ==
* 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31221 and previous config saved to /var/cache/conftool/dbconfig/20220716-221808-ladsgroup.json
* 22:27 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: Cleanup and timestamps (duration: 00m 07s)
* 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31220 and previous config saved to /var/cache/conftool/dbconfig/20220716-220303-ladsgroup.json
* 22:27 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: Cleanup and timestamps
* 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31219 and previous config saved to /var/cache/conftool/dbconfig/20220716-214758-ladsgroup.json
* 22:21 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: (no justification provided) (duration: 00m 06s)
* 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31218 and previous config saved to /var/cache/conftool/dbconfig/20220716-213253-ladsgroup.json
* 22:21 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: (no justification provided)
* 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31217 and previous config saved to /var/cache/conftool/dbconfig/20220716-203238-ladsgroup.json
* 22:19 reedy@deploy1002: deploy aborted: Cleanup and timestamps (duration: 00m 22s)
* 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 22:19 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: Cleanup and timestamps
* 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 22:18 dancy@deploy1002: Finished deploy [integration/docroot@a136ce6]: (no justification provided) (duration: 00m 10s)
* 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 10 hosts with reason: Maintenance
* 22:17 dancy@deploy1002: Started deploy [integration/docroot@a136ce6]: (no justification provided)
* 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 10 hosts with reason: Maintenance
* 22:17 dancy@deploy1002: Installation of scap version "4.27.0" completed for 559 hosts
* 20:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 22:17 dancy@deploy1002: Installing scap version "4.27.0" for 559 hosts
* 20:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 21:41 dancy@deploy1002: Installation of scap version "4.26.0" completed for 559 hosts
* 20:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 21:41 dancy@deploy1002: Installing scap version "4.26.0" for 559 hosts
* 20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 20:33 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 01m 05s)
* 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31216 and previous config saved to /var/cache/conftool/dbconfig/20220716-200803-ladsgroup.json
* 20:32 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
* 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P31215 and previous config saved to /var/cache/conftool/dbconfig/20220716-195258-ladsgroup.json
* 20:27 sukhe: running authdns-update for CR 838882
* 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P31214 and previous config saved to /var/cache/conftool/dbconfig/20220716-193753-ladsgroup.json
* 20:26 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 10s)
* 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31213 and previous config saved to /var/cache/conftool/dbconfig/20220716-192248-ladsgroup.json
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31212 and previous config saved to /var/cache/conftool/dbconfig/20220716-184459-ladsgroup.json
* 20:26 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
* 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 20:25 sukhe: homer "cr*-ulsfo*" commit "Gerrit 838239: sites.yaml: add dns4003 to anycast_neighbors"
* 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 20:24 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 06s)
* 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31211 and previous config saved to /var/cache/conftool/dbconfig/20220716-184428-ladsgroup.json
* 20:23 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
* 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31210 and previous config saved to /var/cache/conftool/dbconfig/20220716-182922-ladsgroup.json
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P31209 and previous config saved to /var/cache/conftool/dbconfig/20220716-181417-ladsgroup.json
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31208 and previous config saved to /var/cache/conftool/dbconfig/20220716-175912-ladsgroup.json
* 20:22 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 31s)
* 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31207 and previous config saved to /var/cache/conftool/dbconfig/20220716-174959-ladsgroup.json
* 20:22 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
* 17:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1136.eqiad.wmnet with reason: Maintenance
* 20:19 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 42s)
* 17:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 20:19 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:837695{{!}}Remove Research Incentive survey from arwiki (T318328)]] (duration: 05m 13s)
* 17:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 20:19 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
* 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31205 and previous config saved to /var/cache/conftool/dbconfig/20220716-173811-ladsgroup.json
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31204 and previous config saved to /var/cache/conftool/dbconfig/20220716-172305-ladsgroup.json
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P31203 and previous config saved to /var/cache/conftool/dbconfig/20220716-170800-ladsgroup.json
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31202 and previous config saved to /var/cache/conftool/dbconfig/20220716-165255-ladsgroup.json
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31201 and previous config saved to /var/cache/conftool/dbconfig/20220716-163449-ladsgroup.json
* 20:14 urbanecm@deploy1002: urbanecm and dani: Backport for [[gerrit:837695{{!}}Remove Research Incentive survey from arwiki (T318328)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 20:14 urbanecm@deploy1002: Started scap: Backport for [[gerrit:837695{{!}}Remove Research Incentive survey from arwiki (T318328)]]
* 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 20:11 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:834044{{!}}Deploy Research Incentive survey on eswiki (T318331)]] (duration: 06m 51s)
* 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31200 and previous config saved to /var/cache/conftool/dbconfig/20220716-163418-ladsgroup.json
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31199 and previous config saved to /var/cache/conftool/dbconfig/20220716-161913-ladsgroup.json
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P31198 and previous config saved to /var/cache/conftool/dbconfig/20220716-160408-ladsgroup.json
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31197 and previous config saved to /var/cache/conftool/dbconfig/20220716-154903-ladsgroup.json
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31196 and previous config saved to /var/cache/conftool/dbconfig/20220716-153647-ladsgroup.json
* 20:05 urbanecm@deploy1002: urbanecm and dani: Backport for [[gerrit:834044{{!}}Deploy Research Incentive survey on eswiki (T318331)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:05 urbanecm@deploy1002: Started scap: Backport for [[gerrit:834044{{!}}Deploy Research Incentive survey on eswiki (T318331)]]
* 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 20:03 mutante: registry* (4 servers) - disabling puppet, deploying gerrit:838859 - [[phab:T308501|T308501]]
* 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31195 and previous config saved to /var/cache/conftool/dbconfig/20220716-153627-ladsgroup.json
* 19:57 reedy@deploy1002: Finished deploy [integration/docroot@09eb565]: [[phab:T319461|T319461]] and cleanup (duration: 00m 10s)
* 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31194 and previous config saved to /var/cache/conftool/dbconfig/20220716-152122-ladsgroup.json
* 19:56 reedy@deploy1002: Started deploy [integration/docroot@09eb565]: [[phab:T319461|T319461]] and cleanup
* 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P31193 and previous config saved to /var/cache/conftool/dbconfig/20220716-150616-ladsgroup.json
* 18:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31192 and previous config saved to /var/cache/conftool/dbconfig/20220716-145111-ladsgroup.json
* 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4003.wikimedia.org with OS buster
* 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31191 and previous config saved to /var/cache/conftool/dbconfig/20220716-143705-ladsgroup.json
* 18:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 18:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31190 and previous config saved to /var/cache/conftool/dbconfig/20220716-143645-ladsgroup.json
* 18:27 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]] (duration: 03m 40s)
* 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31189 and previous config saved to /var/cache/conftool/dbconfig/20220716-142140-ladsgroup.json
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P31188 and previous config saved to /var/cache/conftool/dbconfig/20220716-140634-ladsgroup.json
* 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31187 and previous config saved to /var/cache/conftool/dbconfig/20220716-135129-ladsgroup.json
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31186 and previous config saved to /var/cache/conftool/dbconfig/20220716-134429-ladsgroup.json
* 18:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.4  refs [[phab:T314193|T314193]]
* 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 18:18 brennen: train 1.40.0-wmf.4 ([[phab:T314193|T314193]]) no current blockers, rolling train to group1
* 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 18:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
* 00:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS bullseye
* 17:54 ejegg: payments-wiki upgraded from {{Gerrit|aeee9676}} to {{Gerrit|4e1f308b}}
* 00:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2064.codfw.wmnet with reason: host reimage
* 17:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS buster
* 00:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2064.codfw.wmnet with reason: host reimage
* 17:20 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 00m 14s)
* 00:13 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS bullseye
* 17:20 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
* 17:18 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 00m 18s)
* 17:18 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
* 17:17 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 04m 24s)
* 17:12 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:53 cjming: deployed labs-only config
* 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
* 15:39 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
* 15:29 moritzm: installing gdal security updates
* 15:27 SandraEbele: deployed refinery source
* 14:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on cloudnet[1005-1006].eqiad.wmnet with reason: migrating
* 14:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: decom
* 14:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: decom
* 14:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudnet1003.eqiad.wmnet with reason: decom
* 14:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudnet1003.eqiad.wmnet with reason: decom
* 14:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8359
* 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8359
* 14:30 papaul: on going maintenance on msw1-eqiad
* 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1032.eqiad.wmnet with OS bullseye
* 14:20 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 04m 24s)
* 14:16 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
* 14:16 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a]: Regular analytics weekly train [analytics/refinery@7e16d2a] (duration: 10m 27s)
* 14:15 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
* 14:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
* 14:07 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:06 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 14:05 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a]: Regular analytics weekly train [analytics/refinery@7e16d2a]
* 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1032.eqiad.wmnet with OS bullseye
* 13:37 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@f7a68c2]: (no justification provided) (duration: 00m 12s)
* 13:36 ebysans@deploy1002: Started deploy [airflow-dags/analytics@f7a68c2]: (no justification provided)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:22 SandraEbele: deploying fix for projectview dags on airflow
* 13:21 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for enwiktionary/frwiki (duration: 03m 38s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1031.eqiad.wmnet with OS bullseye
* 13:07 moritzm: draining ganeti1012 [[phab:T311687|T311687]]
* 13:04 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for zhwiki
* 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
* 13:00 vgutierrez: test HAProxy 2.4.19 in cp4026 && cp4032
* 12:59 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia # fetch HAProxy 2.4.19
* 12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
* 12:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1031.eqiad.wmnet with OS bullseye
* 12:47 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1031.eqiad.wmnet with OS bullseye
* 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1031.eqiad.wmnet with OS bullseye
* 12:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 12:41 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for enwiki
* 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1030.eqiad.wmnet with OS bullseye
* 12:30 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:28 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for enwiki/zhwiki (duration: 03m 46s)
* 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1030.eqiad.wmnet with reason: host reimage
* 12:16 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1030.eqiad.wmnet with reason: host reimage
* 12:13 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 12:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1030.eqiad.wmnet with OS bullseye
* 11:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 11:53 XioNoX: fix MTU between eqiad core routers and cloudsw - [[phab:T315838|T315838]]
* 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1029.eqiad.wmnet with OS bullseye
* 11:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 11:49 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
* 11:49 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage
* 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 11:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1029.eqiad.wmnet with OS bullseye
* 11:04 moritzm: running "gnt-cluster upgrade --to 3.0" for ganeti/eqiad [[phab:T311687|T311687]]
* 11:01 vgutierrez: repool cp2036 - [[phab:T319394|T319394]]
* 10:53 vgutierrez: powercycle cp2036 - [[phab:T319394|T319394]]
* 10:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2036.codfw.wmnet
* 10:46 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for commonswiki
* 10:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:44 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for commonswiki (duration: 03m 51s)
* 10:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:36 moritzm: installing gdk-pixbuf security updates
* 09:52 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of ruwikinews
* 09:51 hoo: Ran extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php for all of arwiki
* 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:31 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for ruwikinews (duration: 03m 39s)
* 09:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:21 moritzm: upgrading ganeti/eqiad nodes to Ganeti 3 [[phab:T311687|T311687]]
* 09:20 dcausse: restarting blazegraph on wdqs1014 (BlazegraphFreeAllocatorsDecreasingRapidly)
* 09:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:09 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for arwiki (duration: 03m 49s)
* 09:06 moritzm: reimport ganeti 3.0.1-1~bpo10+1 to component/ganeti3 (got removed alongside via a reprepro bug/misfeature when the bullseye component was removed)
* 07:54 elukey: restart kafka on kafka-logging1003 to pick up new PKI TLS settings
* 07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1003.eqiad.wmnet with reason: Kafka PKI upgrade
* 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1003.eqiad.wmnet with reason: Kafka PKI upgrade
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35360 and previous config saved to /var/cache/conftool/dbconfig/20221005-065519-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35359 and previous config saved to /var/cache/conftool/dbconfig/20221005-064014-root.json
* 06:30 elukey: restart kafka on kafka-logging1002 to pick up the new cert+settings for PKI
* 06:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1002.eqiad.wmnet with reason: Kafka PKI upgrade
* 06:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1002.eqiad.wmnet with reason: Kafka PKI upgrade
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35358 and previous config saved to /var/cache/conftool/dbconfig/20221005-062509-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35357 and previous config saved to /var/cache/conftool/dbconfig/20221005-061004-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35356 and previous config saved to /var/cache/conftool/dbconfig/20221005-055459-root.json
* 05:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62044
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35355 and previous config saved to /var/cache/conftool/dbconfig/20221005-053954-root.json
* 05:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62044
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35354 and previous config saved to /var/cache/conftool/dbconfig/20221005-052449-root.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35353 and previous config saved to /var/cache/conftool/dbconfig/20221005-050944-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P35352 and previous config saved to /var/cache/conftool/dbconfig/20221005-050018-root.json
* 02:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1023.eqiad.wmnet
* 02:21 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1023.eqiad.wmnet
* 02:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1023.eqiad.wmnet
* 02:20 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1023.eqiad.wmnet
* 02:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1023.eqiad.wmnet
* 02:19 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1023.eqiad.wmnet
* 00:05 sukhe: disable puppet on dns4003 till we resolve the puppet failures


== 2022-07-15 ==
== 2022-10-04 ==
* 23:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 23:09 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 22:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
* 23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 21:28 cjming: end of UTC late backport window
* 23:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00
* 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:25 cjming@deploy1002: Finished scap: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] (duration: 05m 06s)
* 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:21 cjming@deploy1002: cjming and cjming: Backport for [[gerrit:838210{{!}}Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.


== 2022-07-14 ==
== 2022-10-03 ==
* 22:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 21:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 21:44 robh@cumin2002: START - Cookbook sre.dns.netbox
* 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T312984|T312984]])', diff saved to https://phabricator.wikimedia.org/P31112 and previous config saved to /var/cache/conftool/dbconfig/20220714-221112-ladsgroup.json
* 21:44 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
* 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P31111 and previous config saved
* 21:18 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
* 19:41 ryankemper: [Elastic] Unbanned `elastic1066`
* 19:37 ryankemper: [Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running
* 19:32 robh: msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work
* 19:25 robh: msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet
* 19:22 ryankemper: [Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": "","_name": "elastic1066-production-search-psi-eqiad"}'`);


== 2022-07-13 ==
== 2022-09-29 ==
* 22:17 inflatador: bking@elastic2055 successfully staged NIC firmware updates for elastic2055-2060
* 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35193 and previous config saved to /var/cache/conftool/dbconfig/20220929-224649-ladsgroup.json
* 22:09 inflatador: bking@elastic2055 staging NIC firmware updates for elastic2055-2060
* 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35192 and previous config saved to /var/cache/conftool/dbconfig/20220929-223143-ladsgroup.json
* 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P35191 and previous config saved to /var/cache/conftool/dbconfig/20220929-221637-ladsgroup.json
* 21:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35190 and previous config saved to /var/cache/conftool/dbconfig/20220929-220130-ladsgroup.json
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35189 and previous config saved to /var/cache/conftool/dbconfig/20220929-215333-ladsgroup.json
* 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 21:43 sukhe: alert1001: restart icinga
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:26 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
* 21:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4045.mgmt.ulsfo.wmnet with reboot policy FORCED
* 21:18 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:18 ejegg: payments-wiki upgraded from {{Gerrit|839d6dde}} to {{Gerrit|aeee9676}}
* 21:14 robh@cumin2002: START - Cookbook sre.dns.netbox
* 21:14 brennen: end of utc late backport and config window
* 21:14 brennen@deploy1002: Finished scap: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]] (duration: 08m 22s)
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:09 Lucas_WMDE: UTC late backport+config window done
* 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813691{{!}}Disable DiscussionTools beta feature at mediawikiwiki (T310960)]] (duration: 02m 47s)
* 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:06 brennen@deploy1002: brennen and ebernhardson: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:05 brennen@deploy1002: Started scap: Backport for [[gerrit:836719{{!}}cirrus: Don't configure cloud clusters for private wikis]]
* 21:02 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:812377{{!}}QuickSurveys: Undeploy 'research-incentive' (T311015)]] (2/2, beta) (duration: 02m 58s)
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:812377{{!}}QuickSurveys: Undeploy 'research-incentive' (T311015)]] (1/2, prod) (duration: 02m 48s)
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 ryankemper: [[phab:T313431|T313431]] Repooled `elastic[2073-2074,2080-2081,2083,2086].codfw.wmnet`. Codfw's all on 5 masters now and cluster is back to green.
* 20:58 brennen@deploy1002: Sync cancelled.
* 20:58 brennen@deploy1002: brennen and trainbranchbot: Backport for [[gerrit:836928{{!}}Revert "cirrus: Don't configure cloud clusters for private wikis"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:58 ryankemper: [[phab:T313431|T313431]] Updated cross-cluster seed conf with new masters; should resolve the settings check alerts
* 20:58 brennen@deploy1002: Started scap: Backport for [[gerrit:836928{{!}}Revert "cirrus: Don't configure cloud clusters for private wikis"]]
* 20:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4027.ulsfo.wmnet
* 20:57 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/CommentItem.js: Backport: [[gerrit:813666{{!}}Avoid localized digits in internal timestamps in JS (T312828)]] (duration: 02m 49s)
* 20:52 brennen@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.3" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.gcoIZ0BTKW"' returned non-zero exit status 255. (duration: 00m 00s)
* 20:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2040.codfw.wmnet with OS bullseye
* 20:52 brennen@deploy1002: Started scap: Backport for [[gerrit:836886{{!}}cirrus: Don't configure cloud clusters for private wikis]]
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:49 robh@cumin2002: START - Cookbook sre.dns.netbox
* 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:813340{{!}}Undeploy CongressLookup (part 3) (T312894)]] (duration: 03m 00s)
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:46 brennen@deploy1002: Sync cancelled.
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:45 brennen@deploy1002: brennen and trainbranchbot: Backport for [[gerrit:836922{{!}}Revert "Add Nepalese Wikipedia tagline"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:45 brennen@deploy1002: Started scap: Backport for [[gerrit:836922{{!}}Revert "Add Nepalese Wikipedia tagline"]]
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch1001.eqiad.wmnet with OS bullseye
* 20:28 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813339{{!}}Undeploy CongressLookup (part 2) (T312894)]] (duration: 02m 53s)
* 20:42 brennen@deploy1002: Sync cancelled.
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:41 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:836880{{!}}Add Nepalese Wikipedia tagline (T318737)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic2080` to pick up new master-eligible status
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 brennen@deploy1002: Started scap: Backport for [[gerrit:836880{{!}}Add Nepalese Wikipedia tagline (T318737)]]
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:38 brennen@deploy1002: Finished scap: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]] (duration: 08m 03s)
* 20:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:813338{{!}}Undeploy CongressLookup (part 1) (T312894)]] (duration: 03m 04s)
* 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2040.codfw.wmnet with reason: host reimage
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2040.codfw.wmnet with reason: host reimage
* 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:59 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2040.codfw.wmnet with OS bullseye
* 20:35 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
* 18:20 sukhe: upload pdns-recursor_4.6.2-1+wmf11u1 to apt.wm.org (bullseye) - [[phab:T305589|T305589]]
* 20:35 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts cp4027.ulsfo.wmnet
* 17:54 sukhe: upload dnsdist_1.7.2-1+wmf11u1 to apt.wm.org (bullseye) - [[phab:T305589|T305589]]
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:48 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 20:33 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4027.ulsfo.wmnet
* 17:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 20:30 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 16:17 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@e58e61d]: (no justification provided) (duration: 00m 10s)
* 20:30 brennen@deploy1002: Started scap: Backport for [[gerrit:836878{{!}}Enable desktop improvements on nowikimedia (T318344)]]
* 16:17 milimetric@deploy1002: Started deploy [airflow-dags/analytics@e58e61d]: (no justification provided)
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:59 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2040.codfw.wmnet with OS bullseye
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]] (duration: 08m 05s)
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:56 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2040.codfw.wmnet with OS bullseye
* 20:19 hoo: Ran foreachwikiindblist wikidataclient-test extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php
* 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:17 ejegg: payments-wiki upgraded from {{Gerrit|0456850e}} to {{Gerrit|839d6dde}} (with cache prefix altered for moved classes)
* 15:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic2086` to pick up new master-eligible status
* 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 brennen@deploy1002: brennen and jdlrobson: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 brennen@deploy1002: Started scap: Backport for [[gerrit:835246{{!}}Web team config cleanup (T316568)]]
* 15:12 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9edd1ab]: Deploy [airflow-dags/analytics@9edd1ab] (duration: 00m 10s)
* 20:04 ejegg: payments-wiki rolled back from {{Gerrit|839d6dde}} to {{Gerrit|0456850e}}
* 15:12 aqu@deploy1002: Started deploy [airflow-dags/analytics@9edd1ab]: Deploy [airflow-dags/analytics@9edd1ab]
* 19:56 ejegg: payments-wiki upgraded from {{Gerrit|0456850e}} to {{Gerrit|839d6dde}}
* 15:10 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9edd1ab]: Deploy [airflow-dags/analytics_test@9edd1ab] (duration: 00m 08s)
* 19:55 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic208[1,3]` to pick up new master-eligible status
* 15:10 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9edd1ab]: Deploy [airflow-dags/analytics_test@9edd1ab]
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
* 14:52 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
* 19:33 ryankemper: [[phab:T313431|T313431]] Restarting elasticsearch_7* services on `elastic207[3,4]` to pick up new master-eligible status
* 14:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
* 19:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 14:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@03c1a05]: Deploy [airflow-dags/analytics_test@03c1a05] (duration: 00m 12s)
* 19:29 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: [[phab:T313431|T313431]]
* 14:34 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@03c1a05]: Deploy [airflow-dags/analytics_test@03c1a05]
* 19:09 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4021.ulsfo.wmnet
* 14:19 aqu: Deployed refinery using scap, then deployed onto hdfs
* 19:09 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:11 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
* 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:08 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67] (duration: 07m 42s)
* 19:04 robh@cumin2002: START - Cookbook sre.dns.netbox
* 14:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
* 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:01 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67]
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:00 aqu@deploy1002: Finished deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67] (duration: 00m 07s)
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:00 aqu@deploy1002: Started deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67]
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:47 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from x1 master', diff saved to https://phabricator.wikimedia.org/P31037 and previous config saved to /var/cache/conftool/dbconfig/20220713-134413-marostegui.json
* 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:37 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye
* 18:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4021.ulsfo.wmnet
* 13:20 Lucas_WMDE: UTC afternoon backport window done
* 18:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:20 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host elastic2049.codfw.wmnet
* 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1061.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:17 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790399{{!}}Configure wgLexemeLexicalCategoryItemIds on Wikidata (T307441)]] (duration: 02m 45s)
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1060.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1059.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1058.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1057.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1056.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1055.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1054.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-stretch1001.mgmt.eqiad.wmnet with reboot policy FORCED
* 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:09 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:08 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:07 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:06 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35188 and previous config saved to /var/cache/conftool/dbconfig/20220929-162812-ladsgroup.json
* 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
* 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35187 and previous config saved to /var/cache/conftool/dbconfig/20220929-162750-ladsgroup.json
* 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35186 and previous config saved to /var/cache/conftool/dbconfig/20220929-161244-ladsgroup.json
* 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P35185 and previous config saved to /var/cache/conftool/dbconfig/20220929-155737-ladsgroup.json
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:836858{{!}}Configure `mul` Wikibase language code on Beta wikis]] (beta-only, prod noop) (duration: 03m 41s)
* 15:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P35184 and previous config saved to /var/cache/conftool/dbconfig/20220929-154231-ladsgroup.json
* 15:35 dancy@deploy1002: Installation of scap version "4.25.0" completed for 561 hosts
* 15:35 dancy@deploy1002: Installing scap version "4.25.0" for 561 hosts
* 14:30 moritzm: installing glib2.0 security updates
* 14:29 moritzm: uploaded glib2.0 2.50.3-2+deb9u3+wmf1  to apt.wikimedia.org/stretch-wikimedia
* 14:17 moritzm: rolling restart of apache2 in mw/eqiad to pick up Expat security updates
* 14:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11164
* 14:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11164
* 13:54 claime: Enabled puppet for C:memcache hosts following merge [[gerrit:835585{{!}}C:memcached Fix memcached bootstrap]]
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 32934
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35179 and previous config saved to /var/cache/conftool/dbconfig/20220929-134844-root.json
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:46 claime: Disabling puppet for C:memcache hosts to merge [[gerrit:835585{{!}}C:memcached Fix memcached bootstrap]]
* 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:41 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 13:41 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]] (duration: 06m 13s)
* 13:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8966
* 13:39 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8966
* 13:35 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and hoo: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:34 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836803{{!}}Wikibase: Set UnconnectedPage page prop format for test wikis]]
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35178 and previous config saved to /var/cache/conftool/dbconfig/20220929-133339-root.json
* 13:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]] (duration: 05m 36s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:28 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kemayo: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 13:27 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836304{{!}}Stop mobile visual enhancements from rolling out to jawiki (T318871)]]
* 13:26 moritzm: restartting Apache on lists
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]] (duration: 05m 23s)
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35176 and previous config saved to /var/cache/conftool/dbconfig/20220929-131834-root.json
* 13:15 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 13:15 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [[gerrit:836227{{!}}Remove wmgEntityUsageModifierLimitsStatement on cebwiki (T296384)]]
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35175 and previous config saved to /var/cache/conftool/dbconfig/20220929-131507-root.json
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:11 moritzm: rolling restart of apache2 in mw/codfw to pick up Expat security updates
* 13:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835291{{!}}votewiki: Change wgLanguageCode to zh for Sep 2022 admins election (T318147)]] (duration: 03m 40s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813594{{!}}Configure $wgBabelCategoryNames on Test
* 13:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35174 and previous config saved to /var/cache/conftool/dbconfig/20220929-130329-root.json
* 13:01 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]] (duration: 04m 04s)
* 13:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35173 and previous config saved to /var/cache/conftool/dbconfig/20220929-130003-root.json
* 12:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:57 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.3  refs [[phab:T314192|T314192]]
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35172 and previous config saved to /var/cache/conftool/dbconfig/20220929-124824-root.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35171 and previous config saved to /var/cache/conftool/dbconfig/20220929-124458-root.json
* 12:44 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]] (duration: 09m 05s)
* 12:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:35 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 12:34 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:836713{{!}}Revert "rdbms: improve LoadBalancer connection pool reuse" (T318904)]]
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35169 and previous config saved to /var/cache/conftool/dbconfig/20220929-123319-root.json
* 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35168 and previous config saved to /var/cache/conftool/dbconfig/20220929-122953-root.json
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35167 and previous config saved to /var/cache/conftool/dbconfig/20220929-121814-root.json
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35166 and previous config saved to


== 2022-07-12 ==
== 2022-09-26 ==
* 22:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2039.codfw.wmnet with OS bullseye
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec (duration: 02m 04s)
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@45ae36d]: subgraph_and_query_metrics: Drop wiki from sparql event partition spec
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* 22:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 22:11 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2039.codfw.wmnet with reason: host reimage
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 21:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2039.codfw.wmnet with OS bullseye
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 20:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2038.codfw.wmnet with OS bullseye
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 20:07 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2038.codfw.wmnet with reason: host reimage
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 19:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 19:38 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 19:35 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 19:34 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 19:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 19:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2038.codfw.wmnet with OS bullseye
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:30 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2038.codfw.wmnet with OS bullseye
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2038.codfw.wmnet with OS bullseye
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:26 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I3071c009c}} (2) (duration: 02m 45s)
* 20:31 TheresNoTime: closing UTC late backport window
* 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 19:20 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I3071c009c}} (duration: 03m 09s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update [[phab:T312298|T312298]]
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic2038.codfw.wmnet with reason: firmware update [[phab:T312298|T312298]]
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 19:13 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic1065.eqiad.wmnet
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 19:13 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic1065.eqiad.wmnet
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 17:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 17:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 17:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2037.codfw.wmnet with OS bullseye
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 16:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2037.codfw.wmnet with reason: host reimage
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 16:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2037.codfw.wmnet with reason: host reimage
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 16:55 bblack: codfw dns repooled for front edge traffic
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 16:50 herron: ran failed codfw puppet agents
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 16:47 mutante: doc1002 - systemctl reset-failed
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1026.eqiad.wmnet
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 16:36 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 16:19 mutante: rebooting mwdebug2001 via ganeti2022
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 16:15 cwhite: repair networking on people2002
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 16:11 cwhite: repair networking on puppetdb2002
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 16:10 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1026.eqiad.wmnet
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 16:05 mutante: parse200[1-3] - restarted ferm
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 16:03 mutante: mw2401 through mw2410 - performing ferm restarts (without cumin, has its own issue)
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 15:57 mutante: mw2405 - restarted ferm
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 15:50 bblack: codfw dns depooled for front edge traffic
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic1065.eqiad.wmnet with reason: firmware update [[phab:T312298|T312298]]
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 15:48 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic1065.eqiad.wmnet with reason: firmware update [[phab:T312298|T312298]]
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 15:30 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 15:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 15:06 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 15:06 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 15:06 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2037.codfw.wmnet with OS bullseye
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 15:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 15:02 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 15:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 14:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 14:56 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 14:52 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 14:48 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2037.codfw.wmnet with OS bullseye
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 14:48 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 14:47 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2037.codfw.wmnet with OS bullseye
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 14:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on druid1008.eqiad.wmnet with reason: [[phab:T308331|T308331]] btullis
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 14:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on druid1008.eqiad.wmnet with reason: [[phab:T308331|T308331]] btullis
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 14:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 14:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2037.codfw.wmnet with OS bullseye
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 14:30 papaul: on going PDU maintenenace in rack A5
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 14:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2037.codfw.wmnet
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 13:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 13:41 Lucas_WMDE: UTC afternoon backport window done
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/DiscussionTools/modules/CommentItem.js: Backport: [[gerrit:812956{{!}}Parse 'DiscussionToolsTimestampFormatSwitchTime' config value as UTC (T312828)]] (duration: 02m 50s)
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 13:56 moritzm: installing mako security updates
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Rack move, [[phab:T308331|T308331]]
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 12:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Rack move, [[phab:T308331|T308331]]
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 10:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to x1 master until the replica is back from maintenance', diff saved to https://phabricator.wikimedia.org/P31018 and previous config saved to /var/cache/conftool/dbconfig/20220712-101246-marostegui.json
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for onsite maintenance [[phab:T308331|T308331]]', diff saved to https://phabricator.wikimedia.org/P31017 and previous config saved to /var/cache/conftool/dbconfig/20220712-101211-root.json
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:51 moritzm: installing bind9 security updates on Bullseye
* 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 12:51 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] (duration: 06m 05s)
* 09:12 hashar: Restarted Zuul [[phab:T309371|T309371]]
* 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:58 hashar: Restarted Gerrit [[phab:T309371|T309371]]
* 12:44 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]]
* 08:25 hashar@deploy1002: Finished deploy [integration/docroot@c2cceaf]: Fix NPM URL for Wikimedia language-data library (duration: 00m 08s)
* 12:25 moritzm: installing unzip security updates
* 08:25 hashar@deploy1002: Started deploy [integration/docroot@c2cceaf]: Fix NPM URL for Wikimedia language-data library
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:10 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@89cb17d]: subgraph_and_query_mapping: Increase executor memory to 12g, use repartition (duration: 02m 02s)
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 07:08 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@89cb17d]: subgraph_and_query_mapping: Increase executor memory to 12g, use repartition
* 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P31014 and previous config saved to /var/cache/conftool/dbconfig/20220712-070240-root.json
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31013 and previous config saved to /var/cache/conftool/dbconfig/20220712-065352-root.json
* 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31012 and previous config saved to /var/cache/conftool/dbconfig/20220712-063848-root.json
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31011 and previous config saved to /var/cache/conftool/dbconfig/20220712-062344-root.json
* 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 06:12 marostegui: dbmaint s3@eqiad [[phab:T310011|T310011]]
* 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 [[phab:T311610|T311610]]', diff saved to https://phabricator.wikimedia.org/P31010 and previous config saved to /var/cache/conftool/dbconfig/20220712-060407-root.json
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1157 to s3 primary and set section read-write [[phab:T311610|T311610]]', diff saved to https://phabricator.wikimedia.org/P31009 and previous config saved to /var/cache/conftool/dbconfig/20220712-060058-marostegui.json
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T311610|T311610]]', diff saved to https://phabricator.wikimedia.org/P31008 and previous config saved to /var/cache/conftool/dbconfig/20220712-060031-marostegui.json
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 06:00 marostegui: Starting s3 eqiad failover from db1123 to db1157 - [[phab:T311610|T311610]]
* 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1157 with weight 0 [[phab:T311610|T311610]]', diff saved to https://phabricator.wikimedia.org/P31007 and previous config saved to /var/cache/conftool/dbconfig/20220712-051927-root.json
* 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|033ab75917932a6b6e1cda8cc26f5f069448e3b9}}: arwiki: Properly grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 46s)
* 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: Primary switchover s3 [[phab:T311610|T311610]]
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: Primary switchover s3 [[phab:T311610|T311610]]
* 08:56 btullis: adding 80GB of virtual disk to matomo1002
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a5486780a0543d7fb1c637d2abe48855e753d13}}: arwiki: Grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 00:10 ejegg: updated payments-wiki from {{Gerrit|53a7b7bd}} to {{Gerrit|2f95d8b4}}
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:07 godog: upgrade grafana to 8.5.13
* 08:04 godog: add 20G to prometheus/analytics in codfw
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:31 oblivian@deploy1002: Finished scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] (duration: 05m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:26 oblivian@deploy1002: Started scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]]
* 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|620bb80e3534c812d7f4de25547d92104b8609a0}}: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and  shi to InterwikiSortOrders (duration: 03m 40s)
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]) (duration: 03m 46s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]; 1/2) (duration: 03m 40s)
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
* 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2d2c08fc6e0dd5c0c85fbe31f85201721871aa9}}: eswiki: Enable structured mentor list ([[phab:T310905|T310905]]) (duration: 04m 30s)
* 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2022-07-11 ==
== 2022-09-25 ==
* 21:49 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3ba1d4c]: subgraph_query_mapping_daily: Increase partitioning to 2048 (duration: 02m 02s)
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 21:47 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3ba1d4c]: subgraph_query_mapping_daily: Increase partitioning to 2048
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 20:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@a559f82]: subgraph: Use HivePartitionRangeSensor to wait for sparql queries (duration: 02m 00s)
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 20:36 TheresNoTime: UTC late deploys done
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 20:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@a559f82]: subgraph: Use HivePartitionRangeSensor to wait for sparql queries
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 20:28 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:812897{{!}}Migrate WikibaseTermboxInteraction from EventLogging to EventGate on all wikis (T290303)]] (duration: 02m 53s)
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 20:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I82262ef6773ab228}} try again ref [[phab:T311788|T311788]] (duration: 03m 07s)
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 19:41 hashar@deploy1002: Finished deploy [integration/docroot@fc5d65a]: Add language-data library (duration: 00m 08s)
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 19:41 hashar@deploy1002: Started deploy [integration/docroot@fc5d65a]: Add language-data library
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P31005 and previous config saved to /var/cache/conftool/dbconfig/20220711-193315-marostegui.json
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 18:32 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 17:10 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 16:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@02ab1c2]: use mode=reschedule on all airflow sensors (duration: 02m 02s)
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 16:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@02ab1c2]: use mode=reschedule on all airflow sensors
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 16:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1005.wikimedia.org with OS bullseye
* 16:11 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I82262ef6773ab228}} (duration: 02m 55s)
* 16:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 15:56 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2175.codfw.wmnet with OS bullseye
* 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1005.wikimedia.org with reason: host reimage
* 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1005.wikimedia.org with reason: host reimage
* 15:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:42 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:812892{{!}} Bumping portals to master (T128546)]] (duration: 02m 51s)
* 15:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2175.codfw.wmnet with reason: host reimage
* 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:812892{{!}} Bumping portals to master (T128546)]] (duration: 02m 58s)
* 15:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2175.codfw.wmnet with reason: host reimage
* 15:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:28 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:27 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:23 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2175.codfw.wmnet with OS bullseye
* 15:08 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:34 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:11 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 14:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 13:54 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:53 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:53 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:53 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:05 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2163 to s8 [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P31002 and previous config saved to /var/cache/conftool/dbconfig/20220711-130441-marostegui.json
* 12:05 moritzm: updated bullseye netboot image for Bullseye 11.4 point release [[phab:T312637|T312637]]
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AniketArs out of all services on: 1292 hosts
* 10:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AniketArs out of all services on: 1292 hosts
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AniketArs out of all services on: 663 hosts
* 10:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AniketArs out of all services on: 663 hosts
* 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2027.codfw.wmnet to cluster codfw and group A
* 08:06 godog: trim thanos raw samples retention to 54w - [[phab:T311690|T311690]]
* 08:04 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2027.codfw.wmnet to cluster codfw and group A
* 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
* 07:52 godog: roll-restart swift-account swift-container across swift/thanos bullseye hosts - [[phab:T297959|T297959]]
* 07:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
* 07:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:43 taavi@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/PageTriage/includes/HookHandlers/UndeleteHookHandler.php: Backport: [[gerrit:812532{{!}}UndeleteHookHandler: fix namespace conditional (T311347)]] (duration: 02m 54s)
* 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2027.codfw.wmnet with OS bullseye
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2080 from dbtcl [[phab:T312618|T312618]]', diff saved to https://phabricator.wikimedia.org/P30999 and previous config saved to /var/cache/conftool/dbconfig/20220711-073346-marostegui.json
* 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2080.codfw.wmnet
* 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2027.codfw.wmnet with reason: host reimage
* 07:26 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 07:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2027.codfw.wmnet with reason: host reimage
* 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2080.codfw.wmnet
* 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2027.codfw.wmnet with OS bullseye
* 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2077.codfw.wmnet
* 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:54 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2077.codfw.wmnet
* 06:28 _joe_: repool thumbor1005
* 06:28 _joe_: depooled thumbor1005, downgraded firejail, restarted units
* 00:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply


== 2022-07-10 ==
== 2022-09-23 ==
* 13:48 godog: silence ProbeDown pages for thumbor:8800 until wed
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance


== 2022-07-09 ==
== 2022-09-22 ==
* 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:48 krinkle@deploy1002: Synchronized php-1.39.0-wmf.19/includes/ResourceLoader/: {{Gerrit|I3e43b10d26858c5b}} (duration: 03m 37s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:23 dancy@deploy1002: backport aborted:  (duration: 00m 05s)
* 01:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:55 brennen: end of utc late backport & config window
* 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:35 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I1bb97d1d601}} (duration: 03m 24s)
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 01:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 20:36 brennen@deploy1002: backport aborted:  (duration: 02m 16s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 18:38 dancy@deploy1002: Started scap: testing
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:39 dancy@deploy1002: Sync cancelled.
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2022-07-08 ==
== 2022-09-21 ==
* 21:44 ryankemper: [Elastic] Reshuffled shards on eqiad to get cluster back into green status (from yellow): https://phabricator.wikimedia.org/P30995#130117
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:32 ori: apt1001: reprepro -C main include buster-wikimedia libvmod-querysort_0.2_amd64.changes
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:58 thcipriani: quick phab downtime for deploy to fix [[phab:T312614|T312614]]
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:57 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 19:57 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 19:57 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 19:56 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 19:56 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 19:56 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 19:49 tzatziki: removing 2 files for legal compliance
* 18:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1001.wikimedia.org with OS bullseye
* 18:26 urandom: changing Cassandra superuser password, AQS cluster -- [[phab:T311652|T311652]]
* 18:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage
* 18:18 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1001.wikimedia.org with reason: host reimage
* 18:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1001.wikimedia.org with OS bullseye
* 16:25 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:29 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:27 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:15 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:59 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1005.wikimedia.org with OS bullseye
* 14:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1004.wikimedia.org with OS bullseye
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30990 and previous config saved to /var/cache/conftool/dbconfig/20220708-143411-root.json
* 14:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage
* 14:22 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1004.wikimedia.org with reason: host reimage
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30983 and previous config saved to /var/cache/conftool/dbconfig/20220708-141907-root.json
* 14:11 hashar@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: AddImage: Only process metadata for a single valid suggestion - [[phab:T312544|T312544]] (duration: 03m 25s)
* 14:09 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1004.wikimedia.org with OS bullseye
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30978 and previous config saved to /var/cache/conftool/dbconfig/20220708-140404-root.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30975 and previous config saved to /var/cache/conftool/dbconfig/20220708-134900-root.json
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30974 and previous config saved to /var/cache/conftool/dbconfig/20220708-133356-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30973 and previous config saved to /var/cache/conftool/dbconfig/20220708-131852-root.json
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30971 and previous config saved to /var/cache/conftool/dbconfig/20220708-130348-root.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30970 and previous config saved to /var/cache/conftool/dbconfig/20220708-124844-root.json
* 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deneb.codfw.wmnet
* 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts deneb.codfw.wmnet
* 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage
* 09:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti2027.codfw.wmnet with reason: Temporarily remove from Ganeti cluster for reimage
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2016.codfw.wmnet to cluster codfw and group D
* 07:33 akosiaris: reboot rdb1009 for kernel upgrades
* 07:29 vgutierrez: restart  pybal on lvs6002
* 07:22 akosiaris: reboot rdb1010 for kernel upgrades
* 06:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2016.codfw.wmnet to cluster codfw and group D
* 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
* 06:47 TimStarling: on mwmaint2002: using iptables to simulate cross-DC memcached traffic loss
* 06:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
* 06:05 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Switch $wgCentralAuthTokenCacheType to mcrouter-primary-dc (duration: 03m 18s)
* 06:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2016.codfw.wmnet with OS bullseye
* 06:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2077 from dbctl [[phab:T312191|T312191]]', diff saved to https://phabricator.wikimedia.org/P30963 and previous config saved to /var/cache/conftool/dbconfig/20220708-055334-marostegui.json
* 05:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2016.codfw.wmnet with reason: host reimage
* 05:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2016.codfw.wmnet with reason: host reimage
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2076.codfw.wmnet
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2076.codfw.wmnet
* 05:31 moritzm: draining ganeti2027 [[phab:T311686|T311686]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2076 from dbctl [[phab:T312190|T312190]]', diff saved to https://phabricator.wikimedia.org/P30962 and previous config saved to /var/cache/conftool/dbconfig/20220708-052926-marostegui.json
* 05:26 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2016.codfw.wmnet with OS bullseye
* 05:23 marostegui: dbmaint s3@eqiad [[phab:T312574|T312574]]
* 04:08 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@b5d49fe]: use mode=reschedule on all airflow sensors (duration: 02m 03s)
* 04:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@b5d49fe]: use mode=reschedule on all airflow sensors
* 03:33 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 03:22 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1004.wikimedia.org with OS bullseye
* 02:27 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@c271774]: Update rdf-spark-tools to 0.3.112 (duration: 02m 13s)
* 02:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1004.wikimedia.org with OS bullseye
* 02:25 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 02:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@c271774]: Update rdf-spark-tools to 0.3.112
* 02:12 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: RL use MainStash on dewiki {{Gerrit|I1c120d64d226}} (duration: 03m 21s)
* 01:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS bullseye
* 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
* 01:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
* 01:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS bullseye
* 01:12 mutante: gitlab1004 - _still_ icinga alerts about rsync to decom'ed host. 'systemctl daemon-reload' to teach it about deleted units, then systemctl reset failed ..then RECOVERY [[phab:T307142|T307142]]
* 00:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2181.codfw.wmnet with OS bullseye
 
== 2022-07-07 ==
* 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: host reimage
* 23:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: host reimage
* 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2180.codfw.wmnet with OS bullseye
* 23:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: host reimage
* 23:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2181.codfw.wmnet with OS bullseye
* 23:26 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I9b97f79618}} (duration: 03m 23s)
* 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2180.codfw.wmnet with reason: host reimage
* 23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2179.codfw.wmnet with OS bullseye
* 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
* 22:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
* 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:56 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|I1f2daab316}} (duration: 03m 43s)
* 22:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2178.codfw.wmnet with reason: host reimage
* 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2178.codfw.wmnet with reason: host reimage
* 22:42 krinkle@deploy1002: Synchronized wmf-config/missing.php: {{Gerrit|I13a4ba0e307a}} (duration: 03m 33s)
* 22:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2179.codfw.wmnet with OS bullseye
* 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2177.codfw.wmnet with OS bullseye
* 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2178.codfw.wmnet with OS bullseye
* 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2176.codfw.wmnet with OS bullseye
* 22:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
* 22:17 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
* 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2176.codfw.wmnet with reason: host reimage
* 21:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2177.codfw.wmnet with OS bullseye
* 21:33 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Ice5302f791fb1d5}} (duration: 03m 18s)
* 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:28 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: {{Gerrit|Ice5302f791fb1d5}} (duration: 03m 18s)
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2177.codfw.wmnet with OS bullseye
* 20:55 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:812017{{!}}Migrate WikibaseTermboxInteraction from EventLogging to EventGate on testwiki (T290303)]] (duration: 03m 12s)
* 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e0a8f03]: tune subgraph_mapping_weekly based on first prod run (duration: 02m 05s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:49 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e0a8f03]: tune subgraph_mapping_weekly based on first prod run
* 20:46 tgr_: UTC late deploys done
* 20:49 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.19/includes/parser/ParserOutput.php: Backport: [[gerrit:811961{{!}}ParserOutput::mergeMapStrategy: don't crash if merging non-array values (T312242)]] (duration: 03m 05s)
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:38 thcipriani@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: Config: [[gerrit:791729{{!}}Enable VisualEditor on thwikibooks by default (T308379)]] (duration: 03m 13s)
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 20:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2176.codfw.wmnet with OS bullseye
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:34 thcipriani@deploy1002: Synchronized wmf-config/config/thwikibooks.yaml: Config: [[gerrit:791729{{!}}Enable VisualEditor on thwikibooks by default (T308379)]] (duration: 03m 25s)
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2181.mgmt.codfw.wmnet with reboot policy FORCED
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2182.mgmt.codfw.wmnet with reboot policy FORCED
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 20:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host kafka-jumbo1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudweb1004.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudweb1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc1001.eqiad.wmnet
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 20:11 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 20:08 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 20:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 20:03 mutante: destroying former strech backend of doc.wikimedia.org, replaced by doc1002 on buster ([[phab:T247653|T247653]])
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:03 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts doc1001.eqiad.wmnet
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.wikimedia.org with OS bullseye
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2182.mgmt.codfw.wmnet with reboot policy FORCED
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2181.mgmt.codfw.wmnet with reboot policy FORCED
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2180.mgmt.codfw.wmnet with reboot policy FORCED
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2179.mgmt.codfw.wmnet with reboot policy FORCED
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.wikimedia.org with reason: host reimage
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 19:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.wikimedia.org with reason: host reimage
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.wikimedia.org with OS bullseye
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:16 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 19:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:10 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:07 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 19:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 19:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.wikimedia.org with reason: host reimage
* 19:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2180.mgmt.codfw.wmnet with reboot policy FORCED
* 18:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.wikimedia.org with reason: host reimage
* 18:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2179.mgmt.codfw.wmnet with reboot policy FORCED
* 18:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2178.mgmt.codfw.wmnet with reboot policy FORCED
* 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.wikimedia.org with OS bullseye
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
* 18:46 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1006.wikimedia.org with OS bullseye
* 18:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
* 18:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 18:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 18:26 brett@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2178.mgmt.codfw.wmnet with reboot policy FORCED
* 18:22 brett@cumin1001: START - Cookbook sre.dns.netbox
* 18:22 brett@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
* 18:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:16 brett@cumin1001: START - Cookbook sre.dns.netbox
* 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
* 18:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:02 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
* 17:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
* 17:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
* 17:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:53 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:51 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 17:51 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 17:39 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2175.mgmt.codfw.wmnet with reboot policy FORCED
* 17:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2177.mgmt.codfw.wmnet with reboot policy FORCED
* 17:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2176.mgmt.codfw.wmnet with reboot policy FORCED
* 17:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1002.wikimedia.org with OS bullseye
* 17:22 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1006.wikimedia.org with OS bullseye
* 17:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.wikimedia.org with OS bullseye
* 17:12 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 17:12 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 17:11 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 17:11 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 17:10 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 17:10 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 17:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1002.wikimedia.org with reason: host reimage
* 17:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1002.wikimedia.org with reason: host reimage
* 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1005.wikimedia.org with OS bullseye
* 16:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 16:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 16:49 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1002.wikimedia.org with OS bullseye
* 16:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bullseye
* 16:48 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 16:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.wikimedia.org with OS bullseye
* 16:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.wikimedia.org with reason: host reimage
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.wikimedia.org with reason: host reimage
* 16:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1001.wikimedia.org with OS bullseye
* 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 16:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1003.wikimedia.org with OS bullseye
* 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
* 16:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.wikimedia.org with OS bullseye
* 16:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
* 16:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1003.wikimedia.org with reason: host reimage
* 16:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
* 16:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 16:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1003.wikimedia.org with reason: host reimage
* 16:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
* 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bullseye
* 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30959 and previous config saved to /var/cache/conftool/dbconfig/20220707-160308-root.json
* 16:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
* 16:01 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
* 16:01 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
* 15:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
* 15:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30958 and previous config saved to /var/cache/conftool/dbconfig/20220707-154804-root.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30957 and previous config saved to /var/cache/conftool/dbconfig/20220707-153300-root.json
* 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30956 and previous config saved to /var/cache/conftool/dbconfig/20220707-151756-root.json
* 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2016.codfw.wmnet with reason: Drop from ganeti cluster for eventual reimage
* 15:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2016.codfw.wmnet with reason: Drop from ganeti cluster for eventual reimage
* 15:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2010.codfw.wmnet to cluster codfw and group C
* 15:09 moritzm: installing containerd security updates
* 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30955 and previous config saved to /var/cache/conftool/dbconfig/20220707-150252-root.json
* 14:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to cluster codfw and group C
* 14:54 reedy@deploy1002: Synchronized composer.json: Cleanup (duration: 03m 19s)
* 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
* 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30953 and previous config saved to /var/cache/conftool/dbconfig/20220707-144748-root.json
* 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
* 14:41 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2010.codfw.wmnet to cluster codfw and group C
* 14:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2010.codfw.wmnet to cluster codfw and group C
* 14:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30952 and previous config saved to /var/cache/conftool/dbconfig/20220707-143244-root.json
* 14:28 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
* 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
* 14:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:23 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
* 14:23 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30951 and previous config saved to /var/cache/conftool/dbconfig/20220707-141740-root.json
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1132', diff saved to https://phabricator.wikimedia.org/P30950 and previous config saved to /var/cache/conftool/dbconfig/20220707-141724-marostegui.json
* 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 14:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
* 13:49 moritzm: draining ganeti2016 [[phab:T311686|T311686]]
* 13:44 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: {{Gerrit|95c38bd0416d2bb14404526bdf8106ef033e77b3}}: ServiceImageRecommendationProvider: Dont fail on first validation error ([[phab:T312521|T312521]]) (duration: 03m 24s)
* 13:41 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/Translate/tag/PageTranslationHooks.php: {{Gerrit|af5174528885a72230be7346e355261383e91b5c}}:  Translation unit deletion: Skip translation update if it doesnt exist ([[phab:T312293|T312293]]) (duration: 03m 32s)
* 13:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 13:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:31 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|aa1d8c8ce27c4e19a621250b6fb3cefdb6b64574}}: GrowthExperiments: Set GEImageRecommendationApiHandler ([[phab:T306032|T306032]]; 2/2) (duration: 03m 37s)
* 13:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 13:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 13:27 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: {{Gerrit|aa1d8c8ce27c4e19a621250b6fb3cefdb6b64574}}: GrowthExperiments: Set GEImageRecommendationApiHandler ([[phab:T306032|T306032]]; 1/2) (duration: 03m 20s)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:24 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: {{Gerrit|df1393f10f987f7ddb41aea135ca43dcc6372715}}: ServiceImageRecommendationProvider: Dont fail on first validation error ([[phab:T312521|T312521]]) (duration: 03m 30s)
* 13:21 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ganeti2010.codfw.wmnet with OS bullseye
* 13:21 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
* 13:20 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 elukey: roll restart eventgate-main pods to add a new stream - [[phab:T301878|T301878]]
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2165 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P30948 and previous config saved to /var/cache/conftool/dbconfig/20220707-131852-marostegui.json
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2 refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2010.codfw.wmnet with reason: host reimage
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 13:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2010.codfw.wmnet with reason: host reimage
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul
* 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 12:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2010.codfw.wmnet with OS bullseye
* 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
* 12:37 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 12:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
* 12:22 moritzm: draining ganeti2015 [[phab:T311686|T311686]]
* 11:53 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
* 11:49 jayme: rolling back helm release eventstreams-internal/main to revision 3 on eqiad and codfw clusters because it's pending-upgrade since Mon Mar 21 21:36:56 2022 / Mon Mar 21 16:05:54 2022
* 11:42 jayme@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
* 11:42 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
* 11:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
* 11:40 jayme: rolling back helm release tegola-vector-tiles/main to revision 11 on staging-eqiad because it's pending-upgrade since Mon Jun 27 09:45:56 2022
* 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 11:00 moritzm: installing intel-microcode security updates
* 10:59 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
* 10:59 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
* 10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
* 10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
* 10:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
* 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
* 10:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 10:32 moritzm: draining ganeti2010 [[phab:T311686|T311686]]
* 10:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 09:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:44 moritzm: installing 5.10.120-1~bpo10+1 kernels on buster hosts running Linux 5.10
* 09:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8599f395bd3af2b27aa06cdc318d44e97efc8119}}: Declare mediawiki.editgrowthconfig schema ([[phab:T312148|T312148]]) (duration: 03m 37s)
* 09:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:38 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 09:37 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 09:35 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 09:33 marostegui: dbmaint s3@eqiad [[phab:T312285|T312285]]
* 09:33 marostegui: dbmaint s7@eqiad [[phab:T312285|T312285]]
* 09:33 marostegui: dbmaint s2@eqiad [[phab:T312285|T312285]]
* 09:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dbstore1007.eqiad.wmnet
* 09:31 marostegui: dbmaint s6@eqiad [[phab:T312285|T312285]]
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2161 to dbctl [[phab:T311493|T311493]]', diff saved to https://phabricator.wikimedia.org/P30940 and previous config saved to /var/cache/conftool/dbconfig/20220707-092424-marostegui.json
* 09:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dbstore1007.eqiad.wmnet
* 09:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dbstore1005.eqiad.wmnet
* 09:17 moritzm: draining ganeti2009 [[phab:T311686|T311686]]
* 09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dbstore1005.eqiad.wmnet
* 09:11 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dbstore1003.eqiad.wmnet
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type back to plain
* 09:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2004.codfw.wmnet with reason: Switch disk type back to plain
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2074', diff saved to https://phabricator.wikimedia.org/P30938 and previous config saved to /var/cache/conftool/dbconfig/20220707-090700-marostegui.json
* 09:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dbstore1003.eqiad.wmnet
* 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2074.codfw.wmnet
* 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 08:47 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2074.codfw.wmnet
* 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:08 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.19 refs [[phab:T308072|T308072]]
* 07:31 marostegui: dbmaint s3@eqiad [[phab:T312286|T312286]]
* 07:29 marostegui: dbmaint s7@eqiad [[phab:T312286|T312286]]
* 07:29 marostegui: dbmaint s2@eqiad [[phab:T312286|T312286]]
* 07:28 marostegui: dbmaint s6@eqiad [[phab:T312286|T312286]]
* 07:27 apergos: UTC morning backport and config training window closed
* 07:23 marostegui: dbmaint s3@eqiad [[phab:T312287|T312287]]
* 07:20 marostegui: dbmaint s6@eqiad [[phab:T312287|T312287]]
* 07:19 marostegui: dbmaint s7@eqiad [[phab:T312287|T312287]]
* 07:19 marostegui: dbmaint s2@eqiad [[phab:T312287|T312287]]
* 07:14 kartik@deploy1002: Synchronized php-1.39.0-wmf.19/extensions/ContentTranslation/modules/mw.cx.MachineTranslationManager.js: Backport: [[gerrit:811806{{!}}Update MT label for Flores (T311411)]] (duration: 03m 20s)
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:07 kartik@deploy1002: Synchronized php-1.39.0-wmf.18/extensions/ContentTranslation/modules/mw.cx.MachineTranslationManager.js: Backport: [[gerrit:811425{{!}}Update MT label for Flores (T311411)]] (duration: 03m 41s)
* 07:07 moritzm: drain ganeti1020 [[phab:T308331|T308331]]
* 07:07 marostegui: dbmaint s3@eqiad [[phab:T312288|T312288]]
* 07:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:03 marostegui: dbmaint s6@eqiad [[phab:T312288|T312288]]
* 07:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:00 marostegui: dbmaint s2@eqiad [[phab:T312288|T312288]]
* 06:56 marostegui: dbmaint s7@eqiad [[phab:T312288|T312288]]
* 06:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye
* 06:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
* 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1160 [[phab:T311611|T311611]]', diff saved to https://phabricator.wikimedia.org/P30937 and previous config saved to /var/cache/conftool/dbconfig/20220707-060743-ladsgroup.json
* 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 primary and set section read-write [[phab:T311611|T311611]]', diff saved to https://phabricator.wikimedia.org/P30936 and previous config saved to /var/cache/conftool/dbconfig/20220707-060112-ladsgroup.json
* 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - [[phab:T311611|T311611]]', diff saved to https://phabricator.wikimedia.org/P30935 and previous config saved to /var/cache/conftool/dbconfig/20220707-060037-ladsgroup.json
* 06:00 Amir1: Starting s4 eqiad failover from db1160 to db1138 - [[phab:T311611|T311611]]
* 05:35 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
* 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 [[phab:T311611|T311611]]', diff saved to https://phabricator.wikimedia.org/P30933 and previous config saved to /var/cache/conftool/dbconfig/20220707-051406-ladsgroup.json
* 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 31 hosts with reason: Primary switchover s4 [[phab:T311611|T311611]]
* 05:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 31 hosts with reason: Primary switchover s4 [[phab:T311611|T311611]]
* 01:09 mutante: gitlab1004 - systemctl reset-failed, clear icinga alerts about rsync to decom'ed machine
* 00:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:25 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab1001.wikimedia.org
* 00:25 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)


== 2022-07-06 ==
== 2022-09-20 ==
* 23:50 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@5082f17]: increase subgraph_mapping_weekly executor memory (duration: 02m 05s)
* 20:19 cjming: end of UTC late backport window
* 23:48 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@5082f17]: increase subgraph_mapping_weekly executor memory
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:30 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 23:25 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab1001.wikimedia.org
* 23:00 mutante: gitlab1004 - rm /lib/systemd/system/rsync-config-backup-gitlab1001*  [[phab:T307142|T307142]]
* 22:52 mutante: etherpad - deleted 2 pads that had leaked information
* 22:52 ebernhardson: restart airflow-webserver and airflow-scheduler for plugins update on an-airflow1001
* 22:37 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@debd402]: airflow dags to generate subgraph and query mapping along with their metrics (duration: 02m 01s)
* 22:35 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@debd402]: airflow dags to generate subgraph and query mapping along with their metrics
* 21:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1005.wikimedia.org with OS bullseye
* 21:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
* 21:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
* 21:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1003.wikimedia.org with OS bullseye
* 21:39 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1002.wikimedia.org with OS bullseye
* 20:59 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit1001.wikimedia.org with OS bullseye
* 20:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.wikimedia.org with OS bullseye
* 20:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
* 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
* 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye
* 20:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bullseye
* 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
* 20:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit1001.wikimedia.org with OS bullseye
* 20:35 cjming: end of UTC late backport window
* 20:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:811762{{!}}Enable sticky header edit A/B test for pilot wikis excluding idwiki/viwiki (T311144)]] (duration: 03m 25s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:54 bd808@mwmaint1002: Testing statshbot following deploy of [[gerrit:809732]]. This should be logged in SAL, but stashbot should not say that was done on irc.
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 19:13 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 19:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 18:48 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 18:48 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 18:47 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 18:47 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 18:45 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 18:45 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 18:10 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 18:02 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster reimage to bullseye - bking@cumin1001 - [[phab:T309343|T309343]]
* 18:50 jynus: restart db2100:s7 to apply new config
* 17:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 17:55 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudcephmon1002.eqiad.wmnet with reason: Moving racks
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:55 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudcephmon1002.eqiad.wmnet with reason: Moving racks
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 17:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 17:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:45 cstone: payments-wiki upgraded from {{Gerrit|de4b2bb9}} to {{Gerrit|0456850e}}
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:06 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 38s)
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:06 inflatador: bking@cloudelastic1006 "restarting elastic services in preparation for cloudelastic reimage [[phab:T309343|T309343]]"
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:07 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl1002.eqiad.wmnet
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1002.eqiad.wmnet on all recursors
* 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 15:57 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1002.eqiad.wmnet on all recursors
* 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 15:53 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:53 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1002.eqiad.wmnet
* 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 15:51 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl1001.eqiad.wmnet
* 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 15:41 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 15:37 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 15:37 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1001.eqiad.wmnet
* 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 15:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 15:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 15:09 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 41s)
* 16:42 dancy@deploy1002: Sync cancelled.
* 15:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 15:05 moritzm: installing intel-microcode security updates
* 16:41 dancy@deploy1002: Started scap: testing, disregard
* 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:09 awight@deploy1002: backport aborted:  (duration: 00m 33s)
* 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (take 2) (duration: 03m 42s)
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (duration: 03m 53s)
* 15:00 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 28s)
* 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:56 cmjohnson1: moving switch ports cloudcephosd1021 from cloudsw1-c to cloudsw2-c [[phab:T310546|T310546]]
* 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
* 14:53 akosiaris: reboot poolcounter1005 for kernel upgrades
* 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
* 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
* 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:49 akosiaris@deploy1002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 03m 33s)
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: {{Gerrit|1ac09d4709c645558f644a885fadc49c05cc04b9}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 39s)
* 14:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore1009.wikimedia.org
* 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: {{Gerrit|1a27e05a7ca53a063d5f9e284d6a09546ac8691c}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 52s)
* 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:34 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:32 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:32 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:30 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
* 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
* 14:30 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
* 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
* 14:27 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:26 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:22 akosiaris: depool eqiad kartotherian [[phab:T305845|T305845]]
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:22 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:17 akosiaris: pool codfw for kartotherian [[phab:T305845|T305845]]
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0b55db6f80df5f4c89f969332a6b31077a7172c4}}: Enable Tech Wishes survey on dewiki ([[phab:T316676|T316676]]) (duration: 04m 12s)
* 14:16 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudstore1009.wikimedia.org
* 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:15 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore1008.wikimedia.org
* 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
* 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
* 14:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:35 hashar: Restarted CI Jenkins for plugin update
* 14:05 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudstore1008.wikimedia.org
* 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 13:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 13:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832993{{!}}testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289)]] (duration: 03m 46s)
* 13:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:54 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.addnode (exit_code=97) for new host ganeti2024.codfw.wmnet to cluster codfw and group A
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:810007{{!}}Add a new Eventgate stream for revision-score events (T301878)]] (duration: 03m 46s)
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2024.codfw.wmnet to cluster codfw and group A
* 07:10 kart_: Updated cxserver to 2022-09-15-113346-production ([[phab:T317289|T317289]], [[phab:T315209|T315209]])
* 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
* 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 13:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2039.codfw.wmnet
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd1003.eqiad.wmnet
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:28 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2039.codfw.wmnet
* 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 13:28 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:28 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1039.eqiad.wmnet
* 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 13:20 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
* 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 13:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2037.codfw.wmnet
* 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 13:19 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1039.eqiad.wmnet
* 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 13:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1038.eqiad.wmnet
* 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1132 ([[phab:T311106|T311106]])', diff saved to https://phabricator.wikimedia.org/P30930 and previous config saved to /var/cache/conftool/dbconfig/20220706-131715-ladsgroup.json
* 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1038.eqiad.wmnet
* 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2037.codfw.wmnet
* 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:09 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1037.eqiad.wmnet
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2036.codfw.wmnet
* 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudstore1009.wikimedia.org
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:04 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudstore1009.wikimedia.org
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
* 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudstore1008.wikimedia.org
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 36m 08s)
* 13:03 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudstore1008.wikimedia.org
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1037.eqiad.wmnet
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2036.codfw.wmnet
* 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1036.eqiad.wmnet
* 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudstore1008.wikimedia.org
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 12:49 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1036.eqiad.wmnet
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:49 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:43 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudstore1008.wikimedia.org
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd1003.eqiad.wmnet on all recursors
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:41 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd1003.eqiad.wmnet on all recursors
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:41 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:40 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-codfw
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:28 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
* 12:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2034.codfw.wmnet
* 12:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1035.eqiad.wmnet
* 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1035.eqiad.wmnet
* 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2034.codfw.wmnet
* 12:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1033.eqiad.wmnet
* 12:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2033.codfw.wmnet
* 11:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1033.eqiad.wmnet
* 11:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2033.codfw.wmnet
* 11:52 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2032.codfw.wmnet
* 11:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1032.eqiad.wmnet
* 11:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1032.eqiad.wmnet
* 11:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2032.codfw.wmnet
* 11:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2031.codfw.wmnet
* 11:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1031.eqiad.wmnet
* 11:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1031.eqiad.wmnet
* 11:28 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2031.codfw.wmnet
* 11:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1030.eqiad.wmnet
* 11:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2030.codfw.wmnet
* 11:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1030.eqiad.wmnet
* 11:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2030.codfw.wmnet
* 11:07 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1029.eqiad.wmnet
* 11:07 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2029.codfw.wmnet
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P30927 and previous config saved to /var/cache/conftool/dbconfig/20220706-110658-root.json
* 10:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1029.eqiad.wmnet
* 10:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2029.codfw.wmnet
* 10:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1028.eqiad.wmnet
* 10:54 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 10:54 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd1003.eqiad.wmnet
* 10:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
* 10:52 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd1002.eqiad.wmnet
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P30925 and previous config saved to /var/cache/conftool/dbconfig/20220706-105154-root.json
* 10:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1028.eqiad.wmnet
* 10:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
* 10:42 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd1002.eqiad.wmnet on all recursors
* 10:42 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd1002.eqiad.wmnet on all recursors
* 10:42 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet
* 10:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1009.eqiad.wmnet
* 10:37 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 10:37 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd1002.eqiad.wmnet
* 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd1001.eqiad.wmnet
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P30923 and previous config saved to /var/cache/conftool/dbconfig/20220706-103650-root.json
* 10:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet
* 10:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1009.eqiad.wmnet
* 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd1001.eqiad.wmnet on all recursors
* 10:27 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd1001.eqiad.wmnet on all recursors
* 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:22 btullis@cumin1001: START - Cookbook sre.dns.netbox
* 10:22 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd1001.eqiad.wmnet
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P30921 and previous config saved to /var/cache/conftool/dbconfig/20220706-102146-root.json
* 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:19 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2024.codfw.wmnet
* 10:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
* 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P30920 and previous config saved to /var/cache/conftool/dbconfig/20220706-100642-root.json
* 10:02 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
* 09:59 volans: restarted wikibugs
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P30919 and previous config saved to /var/cache/conftool/dbconfig/20220706-095138-root.json
* 09:50 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30918 and previous config saved to /var/cache/conftool/dbconfig/20220706-093752-root.json
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30917 and previous config saved to /var/cache/conftool/dbconfig/20220706-093741-root.json
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30916 and previous config saved to /var/cache/conftool/dbconfig/20220706-093733-root.json
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P30915 and previous config saved to /var/cache/conftool/dbconfig/20220706-093634-root.json
* 09:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:23 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2024.codfw.wmnet
* 09:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30914 and previous config saved to /var/cache/conftool/dbconfig/20220706-092248-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30913 and previous config saved to /var/cache/conftool/dbconfig/20220706-092237-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30912 and previous config saved to /var/cache/conftool/dbconfig/20220706-092229-root.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P30911 and previous config saved to /var/cache/conftool/dbconfig/20220706-092130-root.json
* 09:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P30908 and previous config saved to /var/cache/conftool/dbconfig/20220706-091717-root.json
* 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:15 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1039.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:14 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1039.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:14 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1038.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:13 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1038.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:13 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1037.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:11 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1037.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:11 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1036.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:10 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1036.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:10 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1035.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
* 09:09 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1035.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30907 and previous config saved to /var/cache/conftool/dbconfig/20220706-090744-root.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30906 and previous config saved to /var/cache/conftool/dbconfig/20220706-090731-root.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30905 and previous config saved to /var/cache/conftool/dbconfig/20220706-090725-root.json
* 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:04 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2039.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:02 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2039.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:02 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2038.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:01 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2038.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:01 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2037.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:00 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2037.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 09:00 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2036.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:58 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2036.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:55 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2034.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:54 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2034.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:54 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2033.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:53 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2033.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:53 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2032.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30904 and previous config saved to /var/cache/conftool/dbconfig/20220706-085240-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30903 and previous config saved to /var/cache/conftool/dbconfig/20220706-085227-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30902 and previous config saved to /var/cache/conftool/dbconfig/20220706-085221-root.json
* 08:51 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2032.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:51 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2031.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:50 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2031.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:50 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2030.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:48 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2030.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:48 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2029.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:47 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2029.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:47 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be2028.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:46 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be2028.codfw.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:43 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1033.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:41 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1033.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:41 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1032.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:40 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1032.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:40 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1031.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:39 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1031.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:39 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1030.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:37 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1030.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:37 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1029.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30901 and previous config saved to /var/cache/conftool/dbconfig/20220706-083736-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30900 and previous config saved to /var/cache/conftool/dbconfig/20220706-083723-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30899 and previous config saved to /var/cache/conftool/dbconfig/20220706-083718-root.json
* 08:36 mvernon@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1029.eqiad.wmnet: Renew puppet certificate - mvernon@cumin1001
* 08:26 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1033.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Test done ([[phab:T311106|T311106]])', diff saved to https://phabricator.wikimedia.org/P30898 and previous config saved to /var/cache/conftool/dbconfig/20220706-082603-ladsgroup.json
* 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Test done ([[phab:T311106|T311106]])', diff saved to https://phabricator.wikimedia.org/P30897 and previous config saved to /var/cache/conftool/dbconfig/20220706-082540-ladsgroup.json
* 08:25 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1033.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:25 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1032.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:23 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1032.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30896 and previous config saved to /var/cache/conftool/dbconfig/20220706-082232-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30895 and previous config saved to /var/cache/conftool/dbconfig/20220706-082219-root.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30894 and previous config saved to /var/cache/conftool/dbconfig/20220706-082214-root.json
* 08:21 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1031.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:20 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1031.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2024.codfw.wmnet with OS bullseye
* 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1030.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:14 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1030.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:12 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.19  refs [[phab:T308072|T308072]] (duration: 03m 39s)
* 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Test done ([[phab:T311106|T311106]])', diff saved to https://phabricator.wikimedia.org/P30893 and previous config saved to /var/cache/conftool/dbconfig/20220706-081059-ladsgroup.json
* 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Test done ([[phab:T311106|T311106]])', diff saved to https://phabricator.wikimedia.org/P30892 and previous config saved to /var/cache/conftool/dbconfig/20220706-081036-ladsgroup.json
* 08:09 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1029.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:08 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.19  refs [[phab:T308072|T308072]]
* 08:07 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1029.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30891 and previous config saved to /var/cache/conftool/dbconfig/20220706-080728-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30890 and previous config saved to /var/cache/conftool/dbconfig/20220706-080715-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 2%: After maintenance', diff saved to https://phabricator.wikimedia.org/P30889 and previous config saved to /var/cache/conftool/dbconfig/20220706-080710-root.json
* 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2024.codfw.wmnet with reason: host reimage
* 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1028.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 08:01 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for ms-be1028.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001
* 07:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on g