You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(chrisalbon: sudo service uwsgi-ores restart)
imported>Stashbot
(eileen: civicrm upgraded from dcef393d to e198fb4c)
 
(658 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-09-26 ==
== 2022-09-27 ==
* 19:20 chrisalbon: sudo service uwsgi-ores restart
* 01:17 eileen: civicrm upgraded from {{Gerrit|dcef393d}} to {{Gerrit|e198fb4c}}
* 02:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34924 and previous config saved to /var/cache/conftool/dbconfig/20220927-011543-ladsgroup.json
* 02:04 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
* 00:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.wikimedia.org
* 02:04 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
* 00:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
* 01:56 cdanis: ❌cdanis@cumin2001.codfw.wmnet ~ 🕙🍺 sudo cumin 'A:ores and A:codfw'  'systemctl restart celery-ores-worker.service uwsgi-ores.service '
* 00:40 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
* 01:48 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
* 00:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
* 01:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 00:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
* 01:17 cdanis: ❌cdanis@ores2001.codfw.wmnet ~ 🕤🍺 sudo systemctl restart uwsgi-ores.service
* 00:16 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
* 01:11 cdanis: ✔️ cdanis@ores2001.codfw.wmnet ~ 🕘🍺 sudo systemctl restart celery-ores-worker.service
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 00:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 00:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudnet1005.eqiad.wmnet
* 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 00:13 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
* 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34923 and previous config saved to /var/cache/conftool/dbconfig/20220927-000525-ladsgroup.json
* 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 00:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
* 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34922 and previous config saved to /var/cache/conftool/dbconfig/20220927-000434-ladsgroup.json


== 2020-09-25 ==
== 2022-09-26 ==
* 23:03 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables (duration: 26m 57s)
* 23:56 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
* 22:36 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34921 and previous config saved to /var/cache/conftool/dbconfig/20220926-234928-ladsgroup.json
* 22:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity (duration: 10m 42s)
* 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P34920 and previous config saved to /var/cache/conftool/dbconfig/20220926-233422-ladsgroup.json
* food: updated fundraising CiviCRM from {{Gerrit|eb90dbcfd3}} to {{Gerrit|035ad1c351}}
* 23:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
* 22:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity
* 23:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 21:23 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment (duration: 11m 33s)
* 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34919 and previous config saved to /var/cache/conftool/dbconfig/20220926-231915-ladsgroup.json
* 21:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment
* 23:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2032.codfw.wmnet with OS bullseye
* 20:26 effie: installing memcached 1.4.33-1+deb9u1 on mwdebug1001
* 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 19:34 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1 (duration: 53m 58s)
* 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2032.codfw.wmnet with reason: host reimage
* 18:40 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1
* 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2032.codfw.wmnet with OS bullseye
* 17:47 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/MobileFrontend/: Backport: [[gerrit:630065{{!}}Make all section `collapsible` during server side rendering (T263832)]] (duration: 00m 59s)
* 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2031.codfw.wmnet with OS bullseye
* 17:37 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3 (duration: 02m 01s)
* 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 17:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3
* 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2031.codfw.wmnet with reason: host reimage
* 16:35 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import (duration: 01m 10s)
* 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2031.codfw.wmnet with OS bullseye
* 16:34 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import
* 21:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:33 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Promote 1.35.0 to stable in extensiondistributor (duration: 00m 57s)
* 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host centrallog1002.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:23 jynus: fixing enwikivoyage ipblocks inconsistency cluster-wide [[phab:T263842|T263842]]
* 20:31 TheresNoTime: closing UTC late backport window
* 14:54 elukey: install linux-image-4.19-amd64 on an-worker1096 + reboot
* 20:18 samtar@deploy1002: Finished scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] (duration: 06m 52s)
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:13 kormat@cumin1001: dbctl commit (dc=all): 'Add db2113 to various groups [[phab:T263842|T263842]]', diff saved to https://phabricator.wikimedia.org/P12797 and previous config saved to /var/cache/conftool/dbconfig/20200925-121332-kormat.json
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:11 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 11:23 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:835255{{!}}Fix VisualEditor on wikis where RESTBase was never set up (T318325)]]
* 11:10 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation [[phab:T158562|T158562]]
* 20:10 samtar@deploy1002: Finished scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] (duration: 06m 13s)
* 10:42 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:40 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:28 moritzm: reimaging sretest1002 to validate puppetised sources.list with a new installation [[phab:T158562|T158562]]
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:58 moritzm: restarting archiva to pick up Java security update
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2036']
* 09:22 ema: upload@eqsin: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash2036']
* 09:02 ema: text@eqsin: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 20:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2036']
* 06:50 elukey: shutdown ganeti5002 (mistakenly powercycled it without seeing [[phab:T261130|T261130]])
* 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2032']
* 06:40 elukey: powercycle ganeti5002 (no instances running on it, mgmt console shows no tty usable)
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 06:34 elukey: reboot stat1004 to pick up kernel settings
* 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2032']
* 03:10 ejegg: updated payments-wiki from {{Gerrit|f89c594e12}} to {{Gerrit|b2eb456ed1}}
* 20:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2032']
* 02:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: new codfw, [[phab:T263798|T263798]] (duration: 09m 05s)
* 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2031']
* 02:27 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 00m 07s)
* 20:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 02:27 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
* 20:04 samtar@deploy1002: samtar and matmarex: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 02:20 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: new codfw, [[phab:T263798|T263798]]
* 20:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2031']
* 02:20 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: eqiad-only, [[phab:T263798|T263798]] (duration: 06m 09s)
* 20:03 samtar@deploy1002: Started scap: Backport for [[gerrit:835245{{!}}wgMFMobileFormatterOptions: Set maxImages and maxHeadings to very high values (T317070)]]
* 02:14 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: eqiad-only, [[phab:T263798|T263798]]
* 20:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2031']
* 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34918 and previous config saved to /var/cache/conftool/dbconfig/20220926-195019-ladsgroup.json
* 19:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
* 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS bullseye
* 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 18:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
* 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS bullseye
* 18:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2183.codfw.wmnet with OS bullseye
* 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 18:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2183.codfw.wmnet with reason: host reimage
* 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2183.codfw.wmnet with OS bullseye
* 17:31 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 17:30 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2032.mgmt.codfw.wmnet with reboot policy FORCED
* 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:28 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 17:27 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2037.mgmt.codfw.wmnet with reboot policy FORCED
* 17:27 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host logstash2036.mgmt.codfw.wmnet with reboot policy FORCED
* 17:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2184']
* 17:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2184']
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2183']
* 17:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2183']
* 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
* 17:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:08 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
* 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
* 17:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
* 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2031.mgmt.codfw.wmnet with reboot policy FORCED
* 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34914 and previous config saved to /var/cache/conftool/dbconfig/20220926-170213-ladsgroup.json
* 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34913 and previous config saved to /var/cache/conftool/dbconfig/20220926-170151-ladsgroup.json
* 17:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2032
* 16:56 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2032
* 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2031
* 16:55 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2031
* 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34912 and previous config saved to /var/cache/conftool/dbconfig/20220926-164645-ladsgroup.json
* 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P34911 and previous config saved to /var/cache/conftool/dbconfig/20220926-163138-ladsgroup.json
* 16:26 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2184.mgmt.codfw.wmnet with reboot policy FORCED
* 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34910 and previous config saved to /var/cache/conftool/dbconfig/20220926-162322-ladsgroup.json
* 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:16 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34909 and previous config saved to /var/cache/conftool/dbconfig/20220926-161632-ladsgroup.json
* 16:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34908 and previous config saved to /var/cache/conftool/dbconfig/20220926-160817-ladsgroup.json
* 16:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:04 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 16:03 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34907 and previous config saved to /var/cache/conftool/dbconfig/20220926-155312-ladsgroup.json
* 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:43 volans@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 15:40 ladsgroup@deploy1002: Synchronized portals: Migrate wikiversity.org to the modern portals (duration: 03m 36s)
* 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint Done', diff saved to https://phabricator.wikimedia.org/P34906 and previous config saved to /var/cache/conftool/dbconfig/20220926-153807-ladsgroup.json
* 15:37 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikiversity.org to the modern portals (duration: 03m 49s)
* 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
* 13:59 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031] (duration: 00m 09s)
* 13:59 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@a69b031]: Make Airflow jobs use Spark 3 on anlytics_test [airflow-dags@a69b031]
* 13:56 moritzm: installing mako security updates
* 13:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031] (duration: 00m 10s)
* 13:46 aqu@deploy1002: Started deploy [airflow-dags/analytics@a69b031]: Make Airflow jobs use Spark 3 on anlytics [airflow-dags@a69b031]
* 13:45 Lucas_WMDE: UTC afternoon backport+config window done
* 13:41 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/extension.json: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (2/2) (duration: 03m 39s)
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: Backport: [[gerrit:835130{{!}}Set default sortkey for prefixed pages (T315551)]] (1/2) (duration: 03m 51s)
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:835127{{!}}Enable wgCiteResponsiveReferences on etwiki (T318530)]] (duration: 03m 53s)
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:59 awight@deploy1002: Finished deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production (duration: 02m 40s)
* 12:56 awight@deploy1002: Started deploy [kartotherian/deploy@d1bd7dc]: Enable geopoints on production
* 12:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:51 moritzm: installing bind9 security updates on Bullseye
* 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:51 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] (duration: 06m 05s)
* 12:45 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 12:44 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:835169{{!}}Bump portals to HEAD (T273179)]]
* 12:25 moritzm: installing unzip security updates
* 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:04 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34904 and previous config saved to /var/cache/conftool/dbconfig/20220926-094812-ladsgroup.json
* 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34903 and previous config saved to /var/cache/conftool/dbconfig/20220926-094502-ladsgroup.json
* 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
* 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 09:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM matomo1002.eqiad.wmnet
* 08:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|033ab75917932a6b6e1cda8cc26f5f069448e3b9}}: arwiki: Properly grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 46s)
* 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:56 btullis: adding 80GB of virtual disk to matomo1002
* 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a5486780a0543d7fb1c637d2abe48855e753d13}}: arwiki: Grant enrollasmentor to editor ([[phab:T310905|T310905]]) (duration: 03m 40s)
* 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 08:07 godog: upgrade grafana to 8.5.13
* 08:04 godog: add 20G to prometheus/analytics in codfw
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:31 oblivian@deploy1002: Finished scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] (duration: 05m 31s)
* 07:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:26 oblivian@deploy1002: oblivian and oblivian: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 07:26 oblivian@deploy1002: Started scap: Backport for [[gerrit:823681{{!}}Move 100% of cookie-accepting clients to php 7.4 (T271736)]]
* 07:23 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|620bb80e3534c812d7f4de25547d92104b8609a0}}: Add ami, bjn, blk, dag, guw, ig, kcg, lmo, pcm, pwn, and  shi to InterwikiSortOrders (duration: 03m 40s)
* 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]) (duration: 03m 46s)
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|81f66621e923cd2ee3aac6f8b5be0ba2e85fb51d}}: Add wordmark and tagline for mnwiki ([[phab:T318478|T318478]]; 1/2) (duration: 03m 40s)
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:36 elukey: clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition
* 06:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2d2c08fc6e0dd5c0c85fbe31f85201721871aa9}}: eswiki: Enable structured mentor list ([[phab:T310905|T310905]]) (duration: 04m 30s)
* 06:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-09-24 ==
== 2022-09-25 ==
* 23:39 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 01m 58s)
* 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 23:37 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
* 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 21:40 mutante: mw1349 - systemctl reset-failed
* 17:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
* 21:03 cdanis: reprepro: add backported ipvsadm 1:1.31-1+deb10u1 to buster-wikimedia
* 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bullseye
* 21:00 andrew@deploy1001: Finished deploy [horizon/deploy@404e205]: (no justification provided) (duration: 01m 05s)
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 20:59 andrew@deploy1001: Started deploy [horizon/deploy@404e205]: (no justification provided)
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 20:41 andrew@deploy1001: Finished deploy [horizon/deploy@24368a5]: (no justification provided) (duration: 02m 10s)
* 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 20:39 andrew@deploy1001: Started deploy [horizon/deploy@24368a5]: (no justification provided)
* 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 20:35 andrew@deploy1001: Finished deploy [horizon/deploy@85125d1]: (no justification provided) (duration: 00m 52s)
* 15:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 20:34 andrew@deploy1001: Started deploy [horizon/deploy@85125d1]: (no justification provided)
* 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 19:57 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
* 19:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:26 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 02m 44s)
* 19:54 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 15:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 19:47 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: cloudelastic: envoy sits in front now (duration: 00m 59s)
* 15:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 11s)
* 19:41 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 00m 36s)
* 15:20 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 19:41 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
* 15:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data (duration: 01m 10s)
* 19:39 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 01m 08s)
* 15:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard now uses the dynamicproxy api to fetch zone data
* 19:38 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
* 15:13 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bullseye
* 19:30 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: dev (duration: 00m 44s)
* 19:29 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: dev
* 19:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.10
* 19:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bcf9fcbe3b82ab85b8f97206ceca45b64619c362}}: Enable mobile block notice tracking in MobileFrontend ([[phab:T260218|T260218]]) (duration: 01m 04s)
* 18:58 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:627481{{!}}Enable Special:Investigate on itwiki and svwiki (T262436)]] (duration: 01m 05s)
* 18:01 mutante: temp. disabled puppet on install4001/install5001 - applying install_server role to new servers, starting with install3001
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:24 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:21 jbond42: enable puppet fleet wide post update puppetdb postgres logging
* 17:19 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:17 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:15 jbond42: disable puppet fleet wide to update puppetdb postgres loggin
* 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:11 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:09 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:04 mutante: syncing facts to puppet compiler hosts
* 17:01 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:00 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:56 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:26 robh: properly pooled mw1360 this time [[phab:T262151|T262151]]
* 16:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:04 XioNoX: pfw3-eqiad> restart security-log gracefully
* 15:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/AbuseFilter/includes/Hooks/AbuseFilterHookRunner.php: {{Gerrit|5e88c36fa4111cde33dafb0d7ac31a854b95e5ea}}: HookRunner: onAbuseFilterGenerateUserVars should run generateUserVars ([[phab:T263750|T263750]]) (duration: 01m 06s)
* 15:46 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=simplewiki --username="Oversight~simplewiki"` ([[phab:T263760|T263760]])
* 15:44 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=enwiki --username=Oversight` ([[phab:T263760|T263760]])
* 15:43 Urbanecm: Rename all local Oversight accounts but enwiki to Oversight~dbname, see task for full list ([[phab:T263760|T263760]])
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12794 and previous config saved to /var/cache/conftool/dbconfig/20200924-152626-root.json
* 15:15 robh: mw1360 scap and repooled post work via [[phab:T262151|T262151]]
* 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 66%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12793 and previous config saved to /var/cache/conftool/dbconfig/20200924-151120-root.json
* 15:10 jayme: switched zotero service-proxy listener to use TLS - [[phab:T255869|T255869]]
* 15:00 XioNoX: repool eqiad - [[phab:T256112|T256112]]
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 33%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12792 and previous config saved to /var/cache/conftool/dbconfig/20200924-145617-root.json
* 14:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 14:28 XioNoX: [Netops] In window: turn VC-ports on/off for proper cabling: - [[phab:T256112|T256112]]
* 14:19 XioNoX: remove damping on anycast group for cr2-codfw
* 14:18 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255869|T255869]]
* 14:16 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255869|T255869]]
* 14:16 XioNoX: [Netops] Disable unused VC ports to not risk them going online at connect: - [[phab:T256112|T256112]]
* 14:09 jayme: running puppet on lvs servers - [[phab:T255869|T255869]]
* 14:09 cmjohnson1: removing the cable connected to FPC1:1/0 (DAC 3m) FPC8:1/0 (DAC 3m)
* 13:58 moritzm: upgrading mariadb on cloudcontrol-2001/2003/2004
* 13:52 XioNoX: depool eqiad for row D recabling - [[phab:T256112|T256112]]
* 13:32 ottomata: Increased retention time for *.mediawiki.job.processMediaModeration topics in kafka main-eqiad and main-codfw to 31 days (as per request from Pchelolo )
* 13:22 elukey: moved the hadoop cluster to puppet TLS certificates - [[phab:T253957|T253957]]
* 13:17 XioNoX: add damping to anycast BGP - [[phab:T262372|T262372]]
* 12:58 jayme: switched mathoid service-proxy listener to use TLS - [[phab:T255875|T255875]]
* 12:50 moritzm: upgrading bird on centtrallog1001
* 12:43 gehel: restarting wdqs-categories on wdqs1009
* 12:43 moritzm: installing netty-3.9 security updates
* 12:42 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 12:30 ema: upload@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 12:29 godog: swift codfw-prod: rebalance only, no weight change
* 12:27 kormat: powering off db2125 for maintenance [[phab:T260670|T260670]]
* 12:25 moritzm: installing xorg-server security updates
* 12:09 ema: text@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 12:02 ema: cp4022: upgrade varnish to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 11:40 Urbanecm: EU B&C window done
* 11:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/Translate/tag/TPSection.php: {{Gerrit|fa4900e1e6022e645be12505de30b696a9769e77}}: Fix validation of translation unit section names ([[phab:T263546|T263546]]) (duration: 01m 07s)
* 11:25 jbond42: re-enable puppet fleet wide
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fdab74c443bc3328856e8441f4d2df8bc57c6f54}}: Enable ContentTranslation in Bashkir, Urdu and Welsh WPs as a default tool ([[phab:T258504|T258504]]; [[phab:T260022|T260022]]; [[phab:T260024|T260024]]) (duration: 01m 05s)
* 11:21 jbond42: disable puppet fleet wide to reduce log level on puppetdb
* 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|90c72912f26d91df6d28b1efd64e366aaabc5357}}: Move DiscussionTools out of beta on arwiki, cswiki, huwiki ([[phab:T249394|T249394]]); {{Gerrit|d8553f35b4dd581f67bd568d773ff65f316fbfd3}}: Simplify DiscussionTools config (duration: 01m 11s)
* 11:06 moritzm: installing imagemagick security updates on stretch
* 11:02 jbond42: re-enable puppet fleet wide
* 10:51 jbond42: disable puppet fleet wide to deploy a puppetmaster change
* 10:49 moritzm: installing libproxy security updates
* 10:23 volans: uploaded python3-wmflib_0.0.2 to apt.wikimedia.org buster-wikimedia
* 10:20 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12789 and previous config saved to /var/cache/conftool/dbconfig/20200924-102025-kormat.json
* 10:05 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12788 and previous config saved to /var/cache/conftool/dbconfig/20200924-100521-kormat.json
* 10:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:50 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12787 and previous config saved to /var/cache/conftool/dbconfig/20200924-095018-kormat.json
* 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:48 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]]
* 09:46 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255875|T255875]]
* 09:43 jayme: running puppet on lvs servers - [[phab:T255875|T255875]]
* 09:35 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12786 and previous config saved to /var/cache/conftool/dbconfig/20200924-093514-kormat.json
* 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:20 ema: cp4021: repool with varnish 6.0.6-1wm1 [[phab:T263557|T263557]]
* 09:19 ema: cp4021: redepool with varnish to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 09:14 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12785 and previous config saved to /var/cache/conftool/dbconfig/20200924-091445-kormat.json
* 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:14 ema: cp4021: depool and upgrade varnish to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 09:05 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12784 and previous config saved to /var/cache/conftool/dbconfig/20200924-082443-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12783 and previous config saved to /var/cache/conftool/dbconfig/20200924-082319-root.json
* 08:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:17 volans@cumin1001: START - Cookbook sre.dns.netbox
* 08:15 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:15 XioNoX: configure vrrp_master_pinning in codfw - [[phab:T263212|T263212]]
* 08:10 moritzm: installing mariadb-10.1/mariadb-10.3 updates (packaged version from Debian, not the wmf-mariadb variants we used for mysqld)
* 08:09 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 08:08 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 66%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12782 and previous config saved to /var/cache/conftool/dbconfig/20200924-080816-root.json
* 07:58 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 07:57 marostegui: Remove es2018 from tendril and zarcillo [[phab:T263613|T263613]]
* 07:57 XioNoX: configure vrrp_master_pinning in eqiad - [[phab:T263212|T263212]]
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 33%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12781 and previous config saved to /var/cache/conftool/dbconfig/20200924-075312-root.json
* 07:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 07:49 godog: roll restart logstash codfw, gc death
* 07:25 XioNoX: push pfw policies - [[phab:T263674|T263674]]
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Place db2073 into vslow, not api in s4', diff saved to https://phabricator.wikimedia.org/P12780 and previous config saved to /var/cache/conftool/dbconfig/20200924-064018-marostegui.json
* 06:22 elukey: powercycle elastic2037 (host stuck, no mgmt serial console working, DIMM errors in racadm getsel)
* 05:57 marostegui: Remove es2012 from tendril and zarcillo [[phab:T263613|T263613]]
* 05:41 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2012 and es2018 from dbctl - [[phab:T263615|T263615]] [[phab:T263613|T263613]]', diff saved to https://phabricator.wikimedia.org/P12778 and previous config saved to /var/cache/conftool/dbconfig/20200924-053001-marostegui.json
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12777 and previous config saved to /var/cache/conftool/dbconfig/20200924-052207-marostegui.json
* 01:25 ryankemper: Root cause of sigkill of `elasticsearch_5@production-logstash-eqiad.service` appears to be OOMKill of the java process: `Killed process 1775 (java) total-vm:8016136kB, anon-rss:4888232kB, file-rss:0kB, shmem-rss:0kB`. Service appears to have restarted itself and is healthy again
* 01:21 ryankemper: Observed that `elasticsearch_5@production-logstash-eqiad.service` is in a `failed` state since `Thu 2020-09-24 00:53:53 UTC`; appears the process received a SIGKILL - not sure why
* 01:19 ryankemper: Getting `connection refused` when trying to `curl -X GET 'http://localhost:9200/_cluster/health'` on `logstash1009`
* 01:16 ryankemper: (after) `<nowiki>{</nowiki>"cluster_name":"production-elk7-codfw","status":"green","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":868,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
* 01:16 ryankemper: Ran `curl -X POST 'http://localhost:9200/_cluster/reroute?retry_failed=true'`, cluster status is green again
* 01:15 ryankemper: (before) `<nowiki>{</nowiki>"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
* 01:14 ryankemper: (before) `<nowiki>{</nowiki>"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0


== 2020-09-23 ==
== 2022-09-23 ==
* 23:52 mutante: alert1001 - systemctl restar ircecho because icinga-wm left the chat
* 19:10 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4c973d6]: (no justification provided) (duration: 00m 12s)
* 23:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cbd77e3dff0d56b851b3d15b4d267d1faacfae26}}: Add new Racine namespace to frwiktionary ([[phab:T263525|T263525]]) (duration: 01m 05s)
* 19:10 mforns@deploy1002: Started deploy [airflow-dags/analytics@4c973d6]: (no justification provided)
* 23:44 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 17:49 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@7620b25]: (no justification provided) (duration: 00m 10s)
* 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:48 nokafor@deploy1002: Started deploy [airflow-dags/analytics@7620b25]: (no justification provided)
* 23:40 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:39 hashar@deploy1002: Finished scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] (duration: 07m 10s)
* 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|22382a97ec252488a346fbf0c3d40bc974d0cdbe}}: remove wtp2005 from wgLinterSubmitterWhitelist ([[phab:T257903|T257903]]) (duration: 01m 04s)
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:14 eileen: civicrm revision changed from {{Gerrit|32a82aa1b7}} to {{Gerrit|eb90dbcfd3}}, config revision is {{Gerrit|2a55766237}}
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:13 eileen: civicrm revision is {{Gerrit|32a82aa1b7}}, config revision is {{Gerrit|2a55766237}}
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:10 mutante: ganeti5003 - rebooting install5001 - OS install on 3001/4001/5001  [[phab:T263684|T263684]]
* 13:32 hashar@deploy1002: hashar and hashar: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 23:04 mutante: ganeti4003 - rebooting install4001
* 13:31 hashar@deploy1002: Started scap: Backport for [[gerrit:834531{{!}}Stop using Elastica::Type and set the target indices (T318356)]]
* 22:51 mutante: ganeti5003 - rebooting install5001
* 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling (duration: 03m 06s)
* 22:27 mutante: ganeti5003 - gnt-instance start install5001
* 13:26 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-proxy-dashboard improved error handling
* 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:24 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling (duration: 01m 11s)
* 21:38 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:23 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-proxy-dashboard improved error handling
* 21:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:26 jynus: stopping db1117:s3 for maintenance [[phab:T315713|T315713]]
* 21:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 08:51 Emperor: rebalance ms-eqiad swift rings [[phab:T294550|T294550]]
* 21:30 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.10 (duration: 01m 04s)
* 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 21:29 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.10
* 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: Grants fixing
* 21:24 dancy@deploy1001: Finished scap: (no justification provided) (duration: 42m 52s)
* 06:10 marostegui: Shutdown db1189 [[phab:T317662|T317662]]
* 21:12 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 21:06 volans@cumin1001: START - Cookbook sre.dns.netbox
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1189.eqiad.wmnet with reason: on site maintenance
* 20:57 mepps: updated payments-wiki from {{Gerrit|7bb99ce03a}} to {{Gerrit|f89c594e12}}
* 20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:42 dancy: dancy@deploy1001 Started scap: Deploying fixes for [[phab:T263601|T263601]] and [[phab:T263675|T263675]] to 1.36.0-wmf.10
* 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 20:41 dancy@deploy1001: Started scap: (no justification provided)
* 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 20:36 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 20:36 eileen: civicrm revision changed from {{Gerrit|a789afd79b}} to {{Gerrit|32a82aa1b7}}, config revision is {{Gerrit|2a55766237}}
* 20:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:30 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 20:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:27 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 20:22 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 20:18 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 20:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 20:08 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:06 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:02 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 19:42 robh: ganeti5002 firmware update before hw testing via [[phab:T261130|T261130]]
* 18:57 ryankemper: (Above deploy complete)
* 18:54 ryankemper: `scap sync-file wmf-config/ProductionServices.php 'Config: [[gerrit:628978{{!}}cloudelastic: envoy sits in front now (T263073)]]'` from `ryankemper@deploy1001:/srv/mediawiki-staging`
* 18:47 ryankemper: Above deploy appears successful, test requests seem to be taking 40ms instead of the previous 140ms
* 18:31 ryankemper: HEAD of `/srv/mediawiki-staging` is now at {{Gerrit|7a96d63d862eacf5244eec79b63d29d78fbaa6f7}}  as expected
* 18:13 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # [[phab:T263628|T263628]]
* 18:13 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # [[phab:T263628|T263628]]
* 18:12 Urbanecm: urbanecm@deploy1001: scap sync-file wmf-config/InitialiseSettings.php 'b1554f36be68106c9364f4aa2fd70d759ad74356: Set $wgCategoryCollation = uca-tr on trwikiquote ([[phab:T263628|T263628]])'
* 18:11 Urbanecm: Logmsgbot seems to be down
* 17:29 robh: migrating ganeti instances off ganeti5002 for troubleshooting per [[phab:T261130|T261130]]
* 16:37 sukhe: upload dnsdist_1.4.0-1~deb10u2 to apt.wm.o (buster) - [[phab:T252132|T252132]]
* 16:00 herron: switching icinga over from icinga1001 to alert1001 [[phab:T247966|T247966]]
* 16:00 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2088:3312 from api now that db2104/db2126 are done [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12775 and previous config saved to /var/cache/conftool/dbconfig/20200923-160010-kormat.json
* 15:58 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12774 and previous config saved to /var/cache/conftool/dbconfig/20200923-155819-kormat.json
* 15:57 robh: updating firmware on mw1360, troubleshooting nic failure issue [[phab:T262151|T262151]]
* 15:57 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialBlock.php: {{Gerrit|3234fad0d9b370b1cf75093dd13c0e1639619f08}}: SpecialUnblock: Allow getTargetAndType to accept null $par ([[phab:T263642|T263642]]) (duration: 01m 07s)
* 15:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialUnblock.php: {{Gerrit|3234fad0d9b370b1cf75093dd13c0e1639619f08}}: SpecialUnblock: Allow getTargetAndType to accept null $par ([[phab:T263642|T263642]]) (duration: 01m 08s)
* 15:53 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:51 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:48 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:45 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 15:44 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 15:43 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:43 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12773 and previous config saved to /var/cache/conftool/dbconfig/20200923-154315-kormat.json
* 15:40 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:37 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:28 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12772 and previous config saved to /var/cache/conftool/dbconfig/20200923-152812-kormat.json
* 15:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 15:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:13 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12771 and previous config saved to /var/cache/conftool/dbconfig/20200923-151308-kormat.json
* 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:44 kormat@cumin1001: dbctl commit (dc=all): 'db2126 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12770 and previous config saved to /var/cache/conftool/dbconfig/20200923-144441-kormat.json
* 14:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:37 herron: grew prometheus1004 prometheus-ops filesystem to 1.6T
* 14:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:629383{{!}}Enable repo config propagateChangeVisibility everywhere]], 2/2 (duration: 01m 06s)
* 14:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:629383{{!}}Enable repo config propagateChangeVisibility everywhere]], 1/2 (duration: 01m 06s)
* 13:50 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12769 and previous config saved to /var/cache/conftool/dbconfig/20200923-135028-kormat.json
* 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12768 and previous config saved to /var/cache/conftool/dbconfig/20200923-133525-kormat.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 100%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12766 and previous config saved to /var/cache/conftool/dbconfig/20200923-132918-root.json
* 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12765 and previous config saved to /var/cache/conftool/dbconfig/20200923-132022-kormat.json
* 13:20 moritzm: installing ruby-json security updates
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 75%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12764 and previous config saved to /var/cache/conftool/dbconfig/20200923-131414-root.json
* 13:05 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12763 and previous config saved to /var/cache/conftool/dbconfig/20200923-130518-kormat.json
* 13:04 moritzm: installing multipath-tools bugfix updates from buster 10.5 point release
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12762 and previous config saved to /var/cache/conftool/dbconfig/20200923-125911-root.json
* 12:49 moritzm: installing libunwind bugfix updates from buster 10.5 point release
* 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2104 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12761 and previous config saved to /var/cache/conftool/dbconfig/20200923-123922-kormat.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P12760 and previous config saved to /var/cache/conftool/dbconfig/20200923-123806-marostegui.json
* 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:37 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Add db2088:3312 to api while db2104 gets depooled [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12759 and previous config saved to /var/cache/conftool/dbconfig/20200923-123649-kormat.json
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly db2074 ', diff saved to https://phabricator.wikimedia.org/P12758 and previous config saved to /var/cache/conftool/dbconfig/20200923-123528-root.json
* 12:22 ema: cp4027: repool with varnish 6.0.6-1wm1 [[phab:T263557|T263557]]
* 12:09 ema: cp4027: depool and upgrade varnish to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 11:52 moritzm: installing GNUTLS bugfix updates from buster 10.5 point release
* 11:51 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.GrowthTasksApi.js: {{Gerrit|73b5ce82b3913232b708405147f0bb6d27128974}}: Fix GrowthTasksApi lazy-loading flags for pages with no views ([[phab:T263611|T263611]]) (duration: 01m 05s)
* 11:49 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEdit.js: {{Gerrit|1ab31a966edc4748f82f75bb370371733c2ca090}}: Mark pageviews as not used in the mobile postedit notice ([[phab:T263611|T263611]]) (duration: 01m 06s)
* 11:38 Urbanecm: Revert https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629188 and fetch to deploy1001 to unblock EU B&C deployment ([[phab:T237467|T237467]]; cc twentyafterfour)
* 11:27 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12756 and previous config saved to /var/cache/conftool/dbconfig/20200923-112712-kormat.json
* 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12755 and previous config saved to /var/cache/conftool/dbconfig/20200923-111209-kormat.json
* 11:08 Urbanecm: Create ContentTranslation tables at testwiki using SQL files from `/srv/mediawiki/php-1.36.0-wmf.10/extensions/ContentTranslation/sql` ([[phab:T263417|T263417]]
* 10:57 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12754 and previous config saved to /var/cache/conftool/dbconfig/20200923-105705-kormat.json
* 10:42 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12753 and previous config saved to /var/cache/conftool/dbconfig/20200923-104202-kormat.json
* 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12752 and previous config saved to /var/cache/conftool/dbconfig/20200923-102120-kormat.json
* 10:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12751 and previous config saved to /var/cache/conftool/dbconfig/20200923-100156-marostegui.json
* 10:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:629133{{!}}Configure entityDataCachePaths for Wikibase]] (duration: 01m 05s)
* 09:59 elukey: update puppet compiler's facts
* 09:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:620050{{!}}Remove $wgExtraLanguageNames from Wikidata and Commons (T260118)]], part 2/2 (production no-op) (duration: 01m 04s)
* 09:55 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:620050{{!}}Remove $wgExtraLanguageNames from Wikidata and Commons (T260118)]], part 1/2 (duration: 01m 16s)
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12750 and previous config saved to /var/cache/conftool/dbconfig/20200923-094511-marostegui.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12748 and previous config saved to /var/cache/conftool/dbconfig/20200923-083200-marostegui.json
* 08:29 moritzm: installing dbus security updates on buster
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12747 and previous config saved to /var/cache/conftool/dbconfig/20200923-080651-marostegui.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12746 and previous config saved to /var/cache/conftool/dbconfig/20200923-071129-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 to re-add change_revision_id index [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12745 and previous config saved to /var/cache/conftool/dbconfig/20200923-070926-marostegui.json
* 06:34 marostegui: Stop MySQL on es2012 and es2018 [[phab:T263613|T263613]] [[phab:T263615|T263615]]
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2018 [[phab:T263615|T263615]]', diff saved to https://phabricator.wikimedia.org/P12744 and previous config saved to /var/cache/conftool/dbconfig/20200923-063140-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2012 for decommmissioning', diff saved to https://phabricator.wikimedia.org/P12743 and previous config saved to /var/cache/conftool/dbconfig/20200923-060812-marostegui.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index removal [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12742 and previous config saved to /var/cache/conftool/dbconfig/20200923-055850-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12741 and previous config saved to /var/cache/conftool/dbconfig/20200923-055531-marostegui.json
* 05:37 marostegui: Purge global_status_log table on tendril - [[phab:T252331|T252331]]
* 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:03 marostegui: Remove triggers from db2094:3313 for MCR schema change [[phab:T238966|T238966]]
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12739 and previous config saved to /var/cache/conftool/dbconfig/20200923-050234-marostegui.json
* 04:25 eileen: civicrm revision changed from {{Gerrit|8f32b6301f}} to {{Gerrit|a789afd79b}}, config revision is {{Gerrit|9933605187}}


== 2020-09-22 ==
== 2022-09-22 ==
* 23:27 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: clientError: enable on ja,es,de,ru,it,zh,pt wikipedias ([[phab:T255585|T255585]]) (duration: 01m 04s)
* 22:20 joal@deploy1002: Finished deploy [airflow-dags/analytics@901f810]: (no justification provided) (duration: 00m 11s)
* 23:24 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry feature ([[phab:T261249|T261249]]) (duration: 01m 06s)
* 22:19 joal@deploy1002: Started deploy [airflow-dags/analytics@901f810]: (no justification provided)
* 21:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:46 ebernhardson: [[phab:T259539|T259539]] enabled adaptive replica selection on elasticsearch at search.svc.eqiad.wmnet:9[246]43
* 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.10
* 21:23 dancy@deploy1002: backport aborted: (duration: 00m 05s)
* 20:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 42m 21s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:30 mutante: gerrit2001 (gerrit-replica) restarting gerrit service
* 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:49 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
* 20:55 brennen: end of utc late backport & config window
* 19:44 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.5 (duration: 17m 59s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:54 brennen@deploy1002: Finished scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] (duration: 06m 33s)
* 19:29 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@6c81e6f]: (no justification provided) (duration: 00m 10s)
* 17:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:53 joal@deploy1002: Started deploy [airflow-dags/analytics@6c81e6f]: (no justification provided)
* 17:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 20:48 brennen@deploy1002: brennen and arlolra: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:47 brennen@deploy1002: Started scap: Backport for [[gerrit:834364{{!}}Restrict figure to the size of the media (T305357 T318300)]], [[gerrit:834366{{!}}Fix media alignment since disabling wgParserEnableLegacyMediaDOM (T318300)]]
* 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:36 brennen@deploy1002: backport aborted:  (duration: 02m 16s)
* 16:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:00 robh: running dell epsa test on down host mw1360 per [[phab:T262151|T262151]]
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:34 moritzm: installing  nginx security updates on buster
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:33 shdubsh: restart apache on prometheus nodes to pick up new ext endpoint
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:24 ema: upload libvmod-re2 1.5.3-1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:24 papaul: rebooting ms-be2019
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:15 XioNoX: upgrade FNM on netflow2001 - [[phab:T257035|T257035]]
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] (duration: 06m 09s)
* 14:12 jayme: running ipvsadm -D -t 10.2.1.19:1970; ipvsadm -D -t 10.2.1.21:24766 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 20:19 brennen@deploy1002: brennen and tpt: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 14:12 jayme: running ipvsadm -D -t 10.2.2.19:1970; ipvsadm -D -t 10.2.2.21:24766 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 20:19 brennen@deploy1002: Started scap: Backport for [[gerrit:833817{{!}}Drops JS-side creation of "Source" link (T318266)]]
* 14:11 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:10 XioNoX: upgrade FNM on netflow5001 - [[phab:T257035|T257035]]
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 shdubsh: restart statsv on webperf[1-2]001 to route metrics through statsd-exporter
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:09 XioNoX: upgrade FNM on netflow1001 - [[phab:T257035|T257035]]
* 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 14:06 XioNoX: upgrade FNM on netflow3001 - [[phab:T257035|T257035]]
* 18:38 jhuneidi@deploy1002: Started scap: testing
* 14:05 jayme: running puppet on lvs servers - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 18:38 dancy@deploy1002: Started scap: testing
* 14:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 18:37 jhuneidi@deploy1002: Started scap: testing
* 14:02 hnowlan: roll-restarting restbase codfw for java updates
* 18:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@265686e]: (no justification provided) (duration: 00m 13s)
* 13:59 XioNoX: add fastnetmon_1.1.7 to buster-wikimedia repo - [[phab:T257035|T257035]]
* 18:33 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@265686e]: (no justification provided)
* 13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 18:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 13:55 ema: upload varnish-modules 0.15.0-1+wmf1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 18:23 dancy@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: testing (duration: 00m 02s)
* 13:49 marostegui: Deploy MCR change on db2098:3313 - [[phab:T238966|T238966]]
* 18:23 dancy@deploy1002: Locking from deployment [ALL REPOSITORIES]: testing (planned duration: 60m 00s)
* 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:22 dancy@deploy1002: Installation of scap version "4.22.0" completed for 561 hosts
* 13:39 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:22 dancy@deploy1002: Installing scap version "4.22.0" for 561 hosts
* 13:35 ema: upload libvmod-netmapper 1.8-1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:54 ema: upload varnishkafka 1.1.0-1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:11 moritzm: installing python3.7 security updates on Buster
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:09 moritzm: installing bundler updates on buster
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:59 Urbanecm: Reset password for SUL User:Freibo
* 16:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:58 Lucas_WMDE: EU backport&config window done
* 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:56 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource --fix {{!}} tee [[phab:T263358|T263358]].fix # 1350 to fix, 1350 resolvable, 0 deleted
* 16:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:55 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource {{!}} tee [[phab:T263358|T263358]].dryrun # 1350 to fix, 1350 resolvable, 0 deleted
* 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:628598{{!}}Create Portal and Portal_talk namespaces on trwikisource, and fix an incorrect alias (T263358)]] (duration: 00m 57s)
* 16:39 dancy@deploy1002: Sync cancelled.
* 11:47 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:628521{{!}}Removing Wikipedia store link from enwiki (T262329)]] (duration: 00m 57s)
* 16:39 dancy@deploy1002: dancy and dancy: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:628515{{!}}Set timezone for wikis of the CWIRP to Europe/Rome (T263123)]] (duration: 00m 59s)
* 16:38 dancy@deploy1002: Started scap: Backport for [[gerrit:834352{{!}}InitialiseSettings-labs.php: Added test text (to be reverted) (T317242)]]
* 11:35 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:35 hnowlan: roll-restarting restbase eqiad for java updates
* 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:25 ema: upload varnish 6.0.6-1wm1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:13 moritzm: installing intel-microcode 3.20200616.1 on Buster baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:00 moritzm: installing intel-microcode 3.20200616.1 on Stretch baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:51 XioNoX: Add policy-options for primary IXPs to all routers - [[phab:T262517|T262517]]
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:48 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:48 hnowlan: roll-restarting sessionstore for java security updates
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcf37106d32ddda58948dbd6bc7ef3eb823a8e3d}}: Remove Research Incentive survey on idwiki ([[phab:T316466|T316466]]) (duration: 03m 50s)
* 10:44 moritzm: installing bacula security updates on stretch
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:33 moritzm: installing remaining libx11 security updates
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12733 and previous config saved to /var/cache/conftool/dbconfig/20200922-101342-root.json
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12732 and previous config saved to /var/cache/conftool/dbconfig/20200922-101324-root.json
* 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ff867a48d617bc556be23ac595c4e3c5466f69c1}}: Add wgMetaNamespace for knwiktionary and knwikiquote ([[phab:T318318|T318318]]) (duration: 03m 57s)
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12731 and previous config saved to /var/cache/conftool/dbconfig/20200922-101308-root.json
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:00 kormat: deploying schema change to s2 in eqiad. labsdb will have s2 lag until this finishes. [[phab:T259831|T259831]]
* 12:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 09:59 jayme: running ipvsadm -D -t 10.2.1.45:34192; ipvsadm -D -t 10.2.1.42:35192 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 12:37 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 09:59 jayme: running ipvsadm -D -t 10.2.2.45:34192; ipvsadm -D -t 10.2.2.42:35192 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 12:24 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12730 and previous config saved to /var/cache/conftool/dbconfig/20200922-095839-root.json
* 12:24 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12729 and previous config saved to /var/cache/conftool/dbconfig/20200922-095821-root.json
* 12:22 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12728 and previous config saved to /var/cache/conftool/dbconfig/20200922-095805-root.json
* 12:22 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 09:57 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 12:21 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
* 09:55 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 07:35 apergos: UTC morning backport and config training deployment window closed a bit belatedly
* 09:51 jayme: running puppet on lvs servers - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:46 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:46 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12727 and previous config saved to /var/cache/conftool/dbconfig/20200922-094336-root.json
* 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833885{{!}}Enable Content and Section Translation in Bhojpuri Wikipedia (T313296)]] (duration: 04m 03s)
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12726 and previous config saved to /var/cache/conftool/dbconfig/20200922-094317-root.json
* 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12725 and previous config saved to /var/cache/conftool/dbconfig/20200922-094302-root.json
* 09:30 volans: repooling ulsfo after merging DNS migration to Netbox zonefiles - [[phab:T258729|T258729]]
* 09:30 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.uptime (exit_code=0)
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12724 and previous config saved to /var/cache/conftool/dbconfig/20200922-092832-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12723 and previous config saved to /var/cache/conftool/dbconfig/20200922-092814-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12722 and previous config saved to /var/cache/conftool/dbconfig/20200922-092758-root.json
* 09:26 jbond@cumin1001: START - Cookbook sre.pdus.uptime
* 09:24 XioNoX: replace BGP_IXP_in with BGP_IXP_PRIMARY_in on cr3-ulsfo IX BGP group - [[phab:T262517|T262517]]
* 09:22 XioNoX: add BGP_IXP_PRIMARY_in to cr3-ulsfo - [[phab:T262517|T262517]]
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12721 and previous config saved to /var/cache/conftool/dbconfig/20200922-091329-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12720 and previous config saved to /var/cache/conftool/dbconfig/20200922-091310-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12719 and previous config saved to /var/cache/conftool/dbconfig/20200922-091255-root.json
* 09:11 jbond42: update snmp string on ps1-a8-codfw
* 09:05 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12718 and previous config saved to /var/cache/conftool/dbconfig/20200922-090520-kormat.json
* 08:58 _joe_: restart pybal on lvs2009
* 08:56 _joe_: restarting pybal on lvs2010
* 08:54 _joe_: restarted pybal on lvs1015
* 08:50 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12717 and previous config saved to /var/cache/conftool/dbconfig/20200922-085017-kormat.json
* 08:36 _joe_: restarting pybal low-traffic in eqiad to pick up lvs changes
* 08:35 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12715 and previous config saved to /var/cache/conftool/dbconfig/20200922-083514-kormat.json
* 08:22 volans: migrating ulsfo public DNS records to the Netbox-generated ones - [[phab:T258729|T258729]]
* 08:20 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12714 and previous config saved to /var/cache/conftool/dbconfig/20200922-082010-kormat.json
* 08:13 kormat: uploaded wmfmariadbpy v0.5 to apt. deploying now to fleet
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2032, es2033 and es2034 for the first time with minimal weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12713 and previous config saved to /var/cache/conftool/dbconfig/20200922-081154-marostegui.json
* 07:57 volans: migrating ulsfo private DNS records to the Netbox-generated ones - [[phab:T258729|T258729]]
* 07:54 kormat@cumin1001: dbctl commit (dc=all): 'db2076 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12712 and previous config saved to /var/cache/conftool/dbconfig/20200922-075429-kormat.json
* 07:51 jayme: running ipvsadm -D -t 10.2.1.18:8080; ipvsadm -D -t 10.2.1.46:3030 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:49 jayme: running ipvsadm -D -t 10.2.2.18:8080; ipvsadm -D -t 10.2.2.46:3030 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:46 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:42 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:39 jayme: running puppet on lvs servers - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:34 volans: depooling ulsfo to merge DNS migration to Netbox zonefiles - [[phab:T258729|T258729]]
* 07:24 marostegui: Stop MySQL on es2014 - host will be decommissioned [[phab:T262889|T262889]]
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2014 from dbctl [[phab:T262889|T262889]]', diff saved to https://phabricator.wikimedia.org/P12711 and previous config saved to /var/cache/conftool/dbconfig/20200922-071435-marostegui.json
* 07:11 XioNoX: cr1-codfw# run clear bfd session address fe80::f27c:c7ff:fe11:2c1b
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 for decommissioning [[phab:T262889|T262889]]', diff saved to https://phabricator.wikimedia.org/P12710 and previous config saved to /var/cache/conftool/dbconfig/20200922-061815-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 100%: Slowly repool after recloning es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12709 and previous config saved to /var/cache/conftool/dbconfig/20200922-054455-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 100%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12708 and previous config saved to /var/cache/conftool/dbconfig/20200922-054438-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 100%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12707 and previous config saved to /var/cache/conftool/dbconfig/20200922-054430-root.json
* 05:41 marostegui: Log remove triggers on revision table on db1124:3313 [[phab:T238966|T238966]]
* 05:39 marostegui: Deploy MCR schema change on s3 eqiad, this will generate lag on s3 on labsdb [[phab:T238966|T238966]]
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2032, es2033 and es2034 into dbctl [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12706 and previous config saved to /var/cache/conftool/dbconfig/20200922-053346-marostegui.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 75%: Slowly repool after recloning es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12705 and previous config saved to /var/cache/conftool/dbconfig/20200922-052951-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 75%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12704 and previous config saved to /var/cache/conftool/dbconfig/20200922-052935-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 75%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12703 and previous config saved to /var/cache/conftool/dbconfig/20200922-052926-root.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 50%: Slowly repool after recloning es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12702 and previous config saved to /var/cache/conftool/dbconfig/20200922-051448-root.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 50%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12701 and previous config saved to /var/cache/conftool/dbconfig/20200922-051431-root.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 50%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12700 and previous config saved to /var/cache/conftool/dbconfig/20200922-051423-root.json
* 05:00 marostegui: Add es2032 es2033 and es2034 to tendril and zarcillo [[phab:T261717|T261717]]
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 25%: Slowly repool after recloning es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12699 and previous config saved to /var/cache/conftool/dbconfig/20200922-045944-root.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 25%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12698 and previous config saved to /var/cache/conftool/dbconfig/20200922-045928-root.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 25%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12697 and previous config saved to /var/cache/conftool/dbconfig/20200922-045919-root.json
* 01:35 ryankemper: `sudo cumin C:profile::services_proxy::envoy 'enable-puppet "adding cloudelastic to the service proxy --rkemper"'` done
* 01:35 ryankemper: woot! `curl -X GET -s 'http://localhost:6105/_cluster/health'` gives a response as expected. (As do 6106 and 6107). Re-enabling puppet across the fleet...
* 01:32 ryankemper: `sudo run-puppet-agent -e "adding cloudelastic to the service proxy --rkemper"` on `mwdebug1002.eqiad.wmnet`
* 01:28 ryankemper: `sudo puppet-merge` done, now will run puppet on a single eqiad appserver and verify we can curl `localhost:610<nowiki>{</nowiki>5,6,7<nowiki>}</nowiki>`
* 01:17 ryankemper: Disabling puppet on affected nodes via `sudo cumin C:profile::services_proxy::envoy 'disable-puppet "adding cloudelastic to the service proxy --rkemper"'`
* 01:17 ryankemper: Going to test patch to stick envoy in front of `cloudelastic`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/628243


== 2020-09-21 ==
== 2022-09-21 ==
* 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:39 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:36 mutante: debmonitor2002 - systemctl reset-failed
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:46 tgr_: UTC late deploys done
* 22:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:44 tgr@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833810{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 42s)
* 22:20 mutante: releases.wikimedia.org has been converted to an active-active service with geodns/ backends in both DCs
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:56 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:54 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:51 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:36 tgr@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:833809{{!}}Block metrics: Bump schema to un-require some fields (T317343)]] (duration: 03m 55s)
* 21:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:49 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: adjust enwiktionary completion search ranking (duration: 00m 57s)
* 20:25 samtar@deploy1002: Finished scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] (duration: 04m 19s)
* 20:47 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/CirrusSearch/: Remove pages from completion search by page id (duration: 01m 00s)
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:04 herron: moving prometheus instance from bast3004 to prometheus3001 [[phab:T243057|T243057]]
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:46 herron: moving prometheus instance from bast4002 to prometheus4001 [[phab:T243057|T243057]]
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:38 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Push notifications deployment (4/5) (duration: 00m 57s)
* 20:21 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 19:34 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Push notifications deployment (3/5) (duration: 00m 57s)
* 20:20 samtar@deploy1002: Started scap: Backport for [[gerrit:833463{{!}}cirrus: Limit shard count to 1 in deployment-prep (T316711)]]
* 19:28 mholloway-shell@deploy1001: Synchronized wmf-config/ProductionServices.php: Push notifications deployment (2/5) (duration: 00m 57s)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:26 mholloway-shell@deploy1001: Synchronized wmf-config/LabsServices.php: Push notifications deployment (1/5) (duration: 00m 57s)
* 20:17 samtar@deploy1002: Finished scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] (duration: 05m 31s)
* 19:19 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:18 mepps: updated crm to {{Gerrit|8f32b6301f}}
* 20:12 samtar@deploy1002: samtar and kemayo: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 19:15 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:11 samtar@deploy1002: Started scap: Backport for [[gerrit:833837{{!}}Enable DiscussionTools visual enhancements as beta on en/dewiki (T315625)]]
* 19:14 ejegg: updated fundraising CiviCRM from {{Gerrit|e5ebf9d18a}} to {{Gerrit|8f32b6301f}}
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:13 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:59 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622863 [[phab:T249745|T249745]] (duration: 00m 56s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:57 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update {{Gerrit|I336365}} (duration: 06m 54s)
* 20:09 samtar@deploy1002: Finished scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] (duration: 05m 16s)
* 18:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on plwiki ([[phab:T254239|T254239]]) and ptwiki ([[phab:T255027|T255027]]) (duration: 00m 56s)
* 20:04 samtar@deploy1002: samtar and zabe: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 18:50 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update {{Gerrit|I336365}}
* 20:04 samtar@deploy1002: Started scap: Backport for [[gerrit:833830{{!}}Remove deployment-db08 (T318126)]]
* 18:33 mepps: updated crm from {{Gerrit|cc1f7e6d13}} to {{Gerrit|e5ebf9d18a}}
* 19:33 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@ce20ecd]: (no justification provided) (duration: 00m 10s)
* 18:26 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Define Chinese logo variants for Modern Vector (no-op) (part 2) ([[phab:T261153|T261153]]) (duration: 00m 56s)
* 19:33 nokafor@deploy1002: Started deploy [airflow-dags/analytics@ce20ecd]: (no justification provided)
* 18:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Define Chinese logo variants for Modern Vector (no-op) ([[phab:T261153|T261153]]) (duration: 00m 57s)
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:21 catrope@deploy1001: Synchronized static/images/mobile/copyright/: Update Chinese logo variants for Modern Vector ([[phab:T261153|T261153]]) (duration: 00m 56s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 XioNoX: add NAT rule to pfw3-codfw - [[phab:T263488|T263488]]
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:42 papaul: rebooting ps1-a8-codfw firmware upgrade
* 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:46 papaul: shutting down ms-be2019 for BBU replacing
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b8b2ebd3933cb891b62bb6aea01b2342c017cec8}}: Growth: Switch pilot wikis to structured mentor list ([[phab:T310905|T310905]]) (duration: 03m 59s)
* 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12696 and previous config saved to /var/cache/conftool/dbconfig/20200921-162433-root.json
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:17 papaul: replacing  msw-c8-codfw
* 19:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:16 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12695 and previous config saved to /var/cache/conftool/dbconfig/20200921-160929-root.json
* 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 18:55 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8] (duration: 00m 08s)
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12694 and previous config saved to /var/cache/conftool/dbconfig/20200921-155426-root.json
* 18:55 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8] (thin): Regular analytics weekly train THIN [analytics/refinery@91d0cf8]
* 15:51 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/: [[gerrit:628808{{!}}Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 56s)
* 18:44 nokafor@deploy1002: Finished deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8] (duration: 05m 40s)
* 15:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/Util/StatsdMonitoring.php: [[gerrit:628808{{!}}Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 59s)
* 18:38 nokafor@deploy1002: Started deploy [analytics/refinery@91d0cf8]: Regular analytics weekly train [analytics/refinery@91d0cf8]
* 15:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 14:56 Emperor: set thanos ring replicas to 3.75 [[phab:T311690|T311690]]
* 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12693 and previous config saved to /var/cache/conftool/dbconfig/20200921-153923-root.json
* 14:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) [*actually* sync it this time since I forgot to git rebase before the last sync 🤦] (duration: 03m 41s)
* 15:24 hnowlan: roll-restarting restbase-dev for java security updates
* 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:24 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Take db2124 back out of dump/vslow [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12692 and previous config saved to /var/cache/conftool/dbconfig/20200921-151210-kormat.json
* 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:10 moritzm: rolling restart of mw canaries in codfw to pick up libx11 update
* 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:07 moritzm: installing libx11 security updates on stretch
* 14:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833783{{!}}Pool deployment-db09, depool deployment-db08 (T318126)]] (Beta-only, exchange one replica for another) (duration: 03m 48s)
* 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12691 and previous config saved to /var/cache/conftool/dbconfig/20200921-150233-kormat.json
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12690 and previous config saved to /var/cache/conftool/dbconfig/20200921-144729-kormat.json
* 13:59 Lucas_WMDE: UTC afternoon backport+config window done
* 14:40 moritzm: installing qemu security updates on ganeti* stretch nodes
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:37 papaul: firmware upgrade on db2127
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:36 moritzm: installing qemu security updates on ganeti2011 and gnt-instance reboot debmonitor2001
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833776{{!}}Add back deployment-db08 (T318126)]] (Beta-only, restore old replica) (duration: 03m 48s)
* 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:32 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12689 and previous config saved to /var/cache/conftool/dbconfig/20200921-143226-kormat.json
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:30 herron: moving prometheus from bast5001 to prometheus5001 [[phab:T243057|T243057]]
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:24 papaul: disconnecting mgmt on msw-c1-codfw to re-do cable end [[phab:T263138|T263138]]
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:21 marostegui: Set innodb_change_buffering = inserts; on db2125 (s2 slave) for performance testing [[phab:T263443|T263443]]
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:17 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12688 and previous config saved to /var/cache/conftool/dbconfig/20200921-141722-kormat.json
* 13:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/db-labs.php: Config: [[gerrit:833461{{!}}Replace deployment-db08 with deployment-db09 (T318126)]] (Beta-only, replace one replica with another) (duration: 03m 56s)
* 14:11 papaul: disconnecting mgmt on msw-d6-codfw to re-do cable end [[phab:T263138|T263138]]
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:00 moritzm: installing Java security updates on restbase/sessionstore*
* 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2117 for schema change, add db2124 to dump/vslow in the interim [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12687 and previous config saved to /var/cache/conftool/dbconfig/20200921-135821-kormat.json
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:21 moritzm: installing glib-networking security updates for Stretch
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 marostegui: Set innodb_change_buffering = inserts; on db2081 (s8 slave) for performance testing [[phab:T263443|T263443]]
* 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830817{{!}}Add editcontentmodel right for metawiki translation administrators (T311587)]] (duration: 03m 50s)
* 12:59 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=codfw
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:38 XioNoX: set same OSPF metric on both eqiad/codfw links - [[phab:T263230|T263230]]
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:26 marostegui: Set innodb_change_buffering = all; on db2071 (s1 slave) for performance testing  [[phab:T263443|T263443]]
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:26 marostegui: Set innodb_change_buffering = all; on db2129 (s6 master) for performance testing  [[phab:T263443|T263443]]
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:38 effie: restart pybal on lvs2009 and lvs1015 - [[phab:T256973|T256973]]
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed', diff saved to https://phabricator.wikimedia.org/P12684 and previous config saved to /var/cache/conftool/dbconfig/20200921-113708-marostegui.json
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:35 Urbanecm: EU B&C done
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:830707{{!}}Disable wgParserEnableLegacyMediaDOM on enwikivoyage (T314318)]] (turning on new-style media output) (duration: 04m 03s)
* 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend/includes/Transforms/MoveLeadParagraphTransform.php: {{Gerrit|3fab5882505809b412cff641a17ae5d973db04a4}}: Simplify lead paragraph check (duration: 00m 59s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:22 effie: restart pybal on lvs2010 and lvs1016 - [[phab:T256973|T256973]]
* 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a62212a5a8f4692b860eb3d6c3322c82d88125a9}}: Allow local steward group members to bigdelete (duration: 00m 57s)
* 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:12 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=shnwiktionary --fix # [[phab:T256348|T256348]] # P12683
* 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1cf4664df87f10bf60b47345dfe3c52d7dc24f6c}}: Set WT namespace alias to NS_PROJECT in shn.wiktionary ([[phab:T256348|T256348]]) (duration: 00m 57s)
* 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|01ba82866f3e04c7c635e9089fed4269190b93f0}}: Add archive.wul.waseda.ac.jp to the wgCopyUploadDomains ([[phab:T261037|T261037]]) (duration: 00m 57s)
* 08:19 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 04m 02s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bd51f47b1f60fbfafdcc623ae22dcadf2c927876}}: Add *.70yearsindonesiaaustralia.com to the wgCopyUploadsDomains allowlist of commonswiki ([[phab:T262238|T262238]]) (duration: 00m 57s)
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:02 effie: restart pybal on lvs2010 and lvs1016 - [[phab:T256973|T256973]]
* 08:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 10:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:628766{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:628766{{!}} Bumping portals to master (T128546)]] (duration: 01m 12s)
* 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:03 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12682 and previous config saved to /var/cache/conftool/dbconfig/20200921-090343-kormat.json
* 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:48 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12681 and previous config saved to /var/cache/conftool/dbconfig/20200921-084840-kormat.json
* 08:07 hashar: Restarting Gerrit to clear stalled sockets in Zuul
* 08:48 marostegui: Stop MySQL on db2127 for on-site maintenance - [[phab:T262247|T262247]]
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 [[phab:T262247|T262247]]', diff saved to https://phabricator.wikimedia.org/P12680 and previous config saved to /var/cache/conftool/dbconfig/20200921-084730-marostegui.json
* 08:33 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12679 and previous config saved to /var/cache/conftool/dbconfig/20200921-083337-kormat.json
* 08:21 godog: swift codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:18 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12678 and previous config saved to /var/cache/conftool/dbconfig/20200921-081833-kormat.json
* 08:15 godog: roll-restart swift-object-replicator in codfw and eqiad for increased concurrency
* 07:53 hashar: Upgrading all CI Jenkins jobs to Quibble 0.0.45
* 07:05 XioNoX: upgrade FNM to 1.1.7 in ulsfo - [[phab:T257035|T257035]]
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12677 and previous config saved to /var/cache/conftool/dbconfig/20200921-060053-marostegui.json
* 05:48 marostegui: Set innodb_change_buffering = inserts; on db2129 (s6 master) for performance testing
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12676 and previous config saved to /var/cache/conftool/dbconfig/20200921-054730-marostegui.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12675 and previous config saved to /var/cache/conftool/dbconfig/20200921-052704-marostegui.json
* 05:18 marostegui: Stop mysql on: es2013 es2016 es2019 to clone es2032 es2033 es2034 - [[phab:T261717|T261717]]
* 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12674 and previous config saved to /var/cache/conftool/dbconfig/20200921-050632-marostegui.json
* 05:06 marostegui: Deploy MCR schema change on s8 eqiad master, lag will appear on s8 (wikidata) on labsdb hosts [[phab:T238966|T238966]]
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013,es2016 and es2019 to clone new hosts [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12673 and previous config saved to /var/cache/conftool/dbconfig/20200921-050305-marostegui.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2015 as es2 codfw master [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12672 and previous config saved to /var/cache/conftool/dbconfig/20200921-050228-marostegui.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12671 and previous config saved to /var/cache/conftool/dbconfig/20200921-045919-marostegui.json
* 04:37 marostegui: Set innodb_change_buffering = inserts; on db2116 for performance testing
* 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 for the first time with minimal weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12670 and previous config saved to /var/cache/conftool/dbconfig/20200921-043154-marostegui.json


== 2020-09-20 ==
== 2022-09-20 ==
* 08:46 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki  --logwiki=metawiki 'Tepig10102020' 'Davidfromtheworld' # [[phab:T263317|T263317]]
* 20:19 cjming: end of UTC late backport window
* 07:42 gehel: depooling wdqs2002 to catch up on lag
* 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:36 gehel: restarting blazegraph + updater on wdqs2002
* 20:13 cjming@deploy1002: Finished scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] (duration: 09m 02s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262] (duration: 00m 07s)
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@62d8262] (thin): Regular analytics weekly train THIN [analytics/refinery@62d8262]
* 20:05 cjming@deploy1002: cjming and jdlrobson: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
* 20:04 mforns@deploy1002: Finished deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262] (duration: 08m 00s)
* 20:04 cjming@deploy1002: Started scap: Backport for [[gerrit:833435{{!}}Enable Nearby everywhere (T246493)]]
* 20:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
* 20:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
* 20:01 eileen: civicrm upgraded from {{Gerrit|e82d9cd0}} to {{Gerrit|dcef393d}}
* 19:56 mforns@deploy1002: Started deploy [analytics/refinery@62d8262]: Regular analytics weekly train [analytics/refinery@62d8262]
* 19:05 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 18:50 jynus: restart db2100:s7 to apply new config
* 18:48 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
* 18:47 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 18:47 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 18:47 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
* 18:47 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
* 18:46 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
* 18:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
* 18:45 cstone: payments-wiki upgraded from {{Gerrit|de4b2bb9}} to {{Gerrit|0456850e}}
* 18:45 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:36 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 18:33 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:33 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 18:32 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:31 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 18:31 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
* 18:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
* 18:29 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
* 18:28 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
* 18:28 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
* 18:27 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
* 18:27 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
* 18:26 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
* 18:23 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:22 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
* 18:22 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:21 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
* 18:20 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
* 18:19 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
* 16:42 dancy@deploy1002: Sync cancelled.
* 16:42 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:41 dancy@deploy1002: Started scap: testing, disregard
* 16:09 awight@deploy1002: backport aborted:  (duration: 00m 33s)
* 16:04 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (take 2) (duration: 03m 42s)
* 15:55 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:833411{{!}}Disable Tech Wishes survey on dewiki (T316676)]] (duration: 03m 53s)
* 14:16 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:10 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 14:00 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided) (duration: 00m 15s)
* 14:00 nokafor@deploy1002: Started deploy [airflow-dags/analytics@1a7c3b9]: (no justification provided)
* 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P34884 and previous config saved to /var/cache/conftool/dbconfig/20220920-135006-ladsgroup.json
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:43 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.2/extensions/GrowthExperiments/extension.json: {{Gerrit|1ac09d4709c645558f644a885fadc49c05cc04b9}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 39s)
* 13:39 urbanecm@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/GrowthExperiments/extension.json: {{Gerrit|1a27e05a7ca53a063d5f9e284d6a09546ac8691c}}: Update HomepageModule schema version ([[phab:T310320|T310320]]) (duration: 03m 52s)
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 nokafor@deploy1002: Finished deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided) (duration: 00m 11s)
* 13:25 nokafor@deploy1002: Started deploy [airflow-dags/analytics@0e9fb6b]: (no justification provided)
* 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0b55db6f80df5f4c89f969332a6b31077a7172c4}}: Enable Tech Wishes survey on dewiki ([[phab:T316676|T316676]]) (duration: 04m 12s)
* 09:58 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 09:27 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:46 awight@deploy1002: Finished deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854" (duration: 02m 27s)
* 08:43 awight@deploy1002: Started deploy [kartotherian/deploy@4759a78]: Merge "Update kartotherian to e3f3854"
* 08:35 hashar: Restarted CI Jenkins for plugin update
* 08:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 08:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 07:18 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832993{{!}}testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias (T317289)]] (duration: 03m 46s)
* 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:10 kart_: Updated cxserver to 2022-09-15-113346-production ([[phab:T317289|T317289]], [[phab:T315209|T315209]])
* 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 07:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 07:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 07:03 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 07:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 04:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.39.0-wmf.28 (duration: 02m 02s)
* 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]] (duration: 36m 08s)
* 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.2  refs [[phab:T314191|T314191]]
* 02:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-09-19 ==
== 2022-09-19 ==
* 19:03 ariel@deploy1001: Finished deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed (duration: 00m 04s)
* 22:59 ebernhardson: [[phab:T317200|T317200]] start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic
* 19:02 ariel@deploy1001: Started deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed
* 21:21 maryum: Deployed security patch for [[phab:T302479|T302479]]
* 16:49 ejegg: reverted PayPal failmail diversion - IPN verification is working again
* 21:21 mstyles@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s)
* 16:27 ejegg: Diverted SmashPig PayPal failmail to eeggleston only
* 21:15 sbassett: Deployed security patch for [[phab:T312820|T312820]]
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 cjming: end of UTC late backport window
* 20:59 ebernhardson@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: [[gerrit:833031{{!}}Add token_count subfield to outgoing_link (T317546)]] (duration: 03m 51s)
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:21 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820459{{!}}Wikifunctions: Drop two config items moved to docker]] (duration: 03m 38s)
* 20:21 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:16 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:829877{{!}}ExtensionDistributor: Add REL1_39 (T313925)]] (duration: 03m 38s)
* 20:12 cjming@deploy1002: Finished scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] (duration: 06m 31s)
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:06 cjming@deploy1002: cjming and arlolra: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:06 cjming@deploy1002: Started scap: Backport for [[gerrit:832715{{!}}Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]]
* 19:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 19:33 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 19:30 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 19:30 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
* 17:43 dancy@deploy1002: Installation of scap version "4.21.0" completed for 561 hosts
* 17:42 dancy@deploy1002: Installing scap version "4.21.0" for 561 hosts
* 17:36 dancy@deploy1002: Sync cancelled.
* 17:36 dancy@deploy1002: dancy: testing, disregard synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 17:36 dancy@deploy1002: Started scap: testing, disregard
* 14:03 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/ukwikivoyage<nowiki>{</nowiki>.png,-1.5x.png,-2x.png<nowiki>}</nowiki> ([[phab:T317718|T317718]])
* 14:02 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|6c7151d969b6997bd9cce042b7bc78c282dd9b26}}: Regenerate ukwikivoyage logo ([[phab:T317718|T317718]]) (duration: 03m 46s)
* 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cbf161d148228e0e706813f923ab1a5d4b42757a}}: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro ([[phab:T314518|T314518]]) (duration: 04m 01s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a6c1ddf5cd1a46ab05f5d6fda4b938a3ee37238}}: Remove unnecessary wgNamespaceAliases from bnwiki ([[phab:T318003|T318003]]) (duration: 04m 16s)
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2020-09-18 ==
== 2022-09-17 ==
* 21:48 tzatziki: changed password for Millennium bug@ptwiki
* 12:17 Emperor: set thanos ring replicas to 3.80 [[phab:T311690|T311690]]
* 19:28 eileen: process-control config revision is {{Gerrit|739ea754ca}}
* 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34879 and previous config saved to /var/cache/conftool/dbconfig/20220917-103903-ladsgroup.json
* 18:52 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34878 and previous config saved to /var/cache/conftool/dbconfig/20220917-102356-ladsgroup.json
* 18:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P34877 and previous config saved to /var/cache/conftool/dbconfig/20220917-100850-ladsgroup.json
* 18:44 ryankemper: `sudo kill 254017 254018 254028 254029` to kill some dangling serdi / gzip processes, all the wikidata cleanup should be complete
* 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34876 and previous config saved to /var/cache/conftool/dbconfig/20220917-095344-ladsgroup.json
* 18:38 ryankemper: `sudo kill 126121 126122 126124 126128 249520 249521 254016 254027` on `snapshot1008` to terminate wikidata dump jobs that are in a bad state
* 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34875 and previous config saved to /var/cache/conftool/dbconfig/20220917-094856-ladsgroup.json
* 18:10 ryankemper: Removed stale `wikidatardf-dumps` crontab entry from `dumpsgen@snapshot1008`, stored backup of previous state of crontab in the (admittedly verbose) `/tmp/dumpsgen_crontab_before_removing_stale_wikidata_dump_entry_see_gerrit_puppet_patch_622342`
* 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34874 and previous config saved to /var/cache/conftool/dbconfig/20220917-093349-ladsgroup.json
* 17:15 mutante: lists1001 - apt-get install pwgen to generate passwords (this was installed on previous list server but apparently not puppetized, puppet patch coming up)
* 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P34873 and previous config saved to /var/cache/conftool/dbconfig/20220917-091843-ladsgroup.json
* 16:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34872 and previous config saved to /var/cache/conftool/dbconfig/20220917-090336-ladsgroup.json
* 16:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34871 and previous config saved to /var/cache/conftool/dbconfig/20220917-074806-ladsgroup.json
* 15:09 mutante: restarting gerrit service to apply gerrit::628338 to make it dump heap if out of memory ([[phab:T263008|T263008]])
* 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34870 and previous config saved to /var/cache/conftool/dbconfig/20220917-073300-ladsgroup.json
* 14:15 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync ([[phab:T261488|T261488]]) (duration: 00m 56s)
* 07:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P34869 and previous config saved to /var/cache/conftool/dbconfig/20220917-071753-ladsgroup.json
* 14:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync ([[phab:T261488|T261488]]) (duration: 01m 00s)
* 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34868 and previous config saved to /var/cache/conftool/dbconfig/20220917-070247-ladsgroup.json
* 13:02 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34867 and previous config saved to /var/cache/conftool/dbconfig/20220917-051719-ladsgroup.json
* 13:00 kormat@cumin2001: START - Cookbook sre.hosts.downtime
* 05:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 12:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 05:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 12:41 kormat: reimaging db2125 [[phab:T263244|T263244]]
* 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34866 and previous config saved to /var/cache/conftool/dbconfig/20220917-051527-ladsgroup.json
* 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12665 and previous config saved to /var/cache/conftool/dbconfig/20200918-123947-kormat.json
* 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12664 and previous config saved to /var/cache/conftool/dbconfig/20200918-122444-kormat.json
* 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
* 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12663 and previous config saved to /var/cache/conftool/dbconfig/20200918-120940-kormat.json
* 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34865 and previous config saved to /var/cache/conftool/dbconfig/20220917-051203-ladsgroup.json
* 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12662 and previous config saved to /var/cache/conftool/dbconfig/20200918-115437-kormat.json
* 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125', diff saved to https://phabricator.wikimedia.org/P12661 and previous config saved to /var/cache/conftool/dbconfig/20200918-113509-marostegui.json
* 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 11:15 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12660 and previous config saved to /var/cache/conftool/dbconfig/20200918-111529-kormat.json
* 10:56 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12659 and previous config saved to /var/cache/conftool/dbconfig/20200918-105645-kormat.json
* 10:45 jiji@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:41 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12658 and previous config saved to /var/cache/conftool/dbconfig/20200918-104141-kormat.json
* 10:35 jiji@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:28 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:26 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12657 and previous config saved to /var/cache/conftool/dbconfig/20200918-102638-kormat.json
* 10:11 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12656 and previous config saved to /var/cache/conftool/dbconfig/20200918-101135-kormat.json
* 09:55 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12655 and previous config saved to /var/cache/conftool/dbconfig/20200918-095554-kormat.json
* 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 09:47 twentyafterfour: deployed hotfix for [[phab:T263063|T263063]] to phab1001
* 09:47 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1001 - [[phab:T262527|T262527]]
* 09:46 jayme: uncordoned kubestage1001 - [[phab:T262527|T262527]]
* 09:46 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12654 and previous config saved to /var/cache/conftool/dbconfig/20200918-094608-kormat.json
* 09:31 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 80%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12653 and previous config saved to /var/cache/conftool/dbconfig/20200918-093105-kormat.json
* 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:22 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 60%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12652 and previous config saved to /var/cache/conftool/dbconfig/20200918-091601-kormat.json
* 09:00 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 40%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12651 and previous config saved to /var/cache/conftool/dbconfig/20200918-090058-kormat.json
* 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:56 jayme: reboot kubestage1001 for clean state - [[phab:T262527|T262527]]
* 08:54 elukey: change analytics-in4/in6 filters on cr1/cr2 after https://gerrit.wikimedia.org/r/628300
* 08:47 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:45 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 20%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12650 and previous config saved to /var/cache/conftool/dbconfig/20200918-084554-kormat.json
* 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:43 jayme: reboot kubestage1001 for kernel upgrade - [[phab:T262527|T262527]]
* 08:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:25 jayme: reboot kubestage1001 for clean state testing - [[phab:T262527|T262527]]
* 08:22 kormat@cumin1001: dbctl commit (dc=all): 'db2124 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12648 and previous config saved to /var/cache/conftool/dbconfig/20200918-082223-kormat.json
* 08:16 klausman: reinstalling stat1004 with Buster
* 07:17 moritzm: installing xdg-utils security updates
* 07:14 XioNoX: push pfw policies - [[phab:T263168|T263168]]
* 07:12 jayme: draining kubestage1001 for kernel upgrade - [[phab:T262527|T262527]]
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2018, es2012 after cloning es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12647 and previous config saved to /var/cache/conftool/dbconfig/20200918-062127-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12646 and previous config saved to /var/cache/conftool/dbconfig/20200918-060815-marostegui.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after rack move', diff saved to https://phabricator.wikimedia.org/P12645 and previous config saved to /var/cache/conftool/dbconfig/20200918-060724-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12644 and previous config saved to /var/cache/conftool/dbconfig/20200918-060103-marostegui.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12643 and previous config saved to /var/cache/conftool/dbconfig/20200918-053758-marostegui.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2029 and es2030 to dbctl depooled - [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12642 and previous config saved to /var/cache/conftool/dbconfig/20200918-053604-marostegui.json
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12641 and previous config saved to /var/cache/conftool/dbconfig/20200918-052608-marostegui.json
* 05:15 marostegui: Restart wikibugs


== 2020-09-17 ==
== 2022-09-16 ==
* 23:41 ejegg: updated payments-wiki from {{Gerrit|86c997fdb2}} to {{Gerrit|7bb99ce03a}}
* 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 23:01 ejegg: updated payments-wiki from {{Gerrit|1e5a52ed26}} to {{Gerrit|86c997fdb2}}
* 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 20:47 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: {{Gerrit|19b9b9877ea3f8ffa6626108941891c2454348de}}: Fix APCOND_FR_NEVERBLOCKED handling (part 3; [[phab:T262970|T262970]]) (duration: 00m 57s)
* 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34864 and previous config saved to /var/cache/conftool/dbconfig/20220916-212905-ladsgroup.json
* 19:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34863 and previous config saved to /var/cache/conftool/dbconfig/20220916-211358-ladsgroup.json
* 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P34862 and previous config saved to /var/cache/conftool/dbconfig/20220916-205852-ladsgroup.json
* 19:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=wikidatawiki --logwiki=metawiki 'Filomena ciavarella' 'Filomena Ciavarella' #[[phab:T262657|T262657]]
* 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34861 and previous config saved to /var/cache/conftool/dbconfig/20220916-204345-ladsgroup.json
* 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 19:16 mutante: cp1081 /usr/local/sbin/update-ocsp-all
* 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:01 mutante: gitlab-runner*: deployed gerrit:832584 and systemctl restart buildkitd on 6 hosts for [[phab:T317904|T317904]]
* 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 16:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 18:29 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 18:11 Urbanecm: Morning B&C done
* 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|40591d3dfdc2fc360cb060770677a48e2a53362c}}: Enable DiscussionTools beta on jawiki & viwiki ([[phab:T261654|T261654]]; [[phab:T262109|T262109]]) (duration: 00m 56s)
* 16:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 18:06 Urbanecm: Move /srv/mediawiki-stagging/grep (owned by tstarling) to /home/urbanecm to make working directory clean (cc TimStarling)
* 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2183.mgmt.codfw.wmnet with reboot policy FORCED
* 17:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:20 rzl: repooled eqiad at 17:11
* 16:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184
* 17:12 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 16:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2184
* 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2183
* 17:03 papaul: restarting ps1-d8-codfw
* 16:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2183
* 16:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 01m 12s)
* 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34860 and previous config saved to /var/cache/conftool/dbconfig/20220916-161409-ladsgroup.json
* 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
* 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 16:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 02m 50s)
* 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
* 16:41 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
* 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34859 and previous config saved to /var/cache/conftool/dbconfig/20220916-161346-ladsgroup.json
* 16:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 07m 26s)
* 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34858 and previous config saved to /var/cache/conftool/dbconfig/20220916-155840-ladsgroup.json
* 16:33 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
* 15:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 16:33 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema (duration: 06m 14s)
* 15:52 dancy@deploy1002: Installation of scap version "4.20.0" completed for 561 hosts
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 dancy@deploy1002: Installing scap version "4.20.0" for 561 hosts
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:44 dancy@deploy1002: Finished scap: testing (duration: 04m 53s)
* 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P34857 and previous config saved to /var/cache/conftool/dbconfig/20220916-154333-ladsgroup.json
* 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema
* 15:39 dancy@deploy1002: Started scap: testing
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34856 and previous config saved to /var/cache/conftool/dbconfig/20220916-152827-ladsgroup.json
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 16:21 marostegui: Restart wikibugs
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 16:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 16:15 papaul: replacing msw-d8-codfw
* 15:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1131 IP after moving it to a different rack [[phab:T262901|T262901]]', diff saved to https://phabricator.wikimedia.org/P12639 and previous config saved to /var/cache/conftool/dbconfig/20200917-160540-marostegui.json
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:03 marostegui: Recreate db1131 on tendril [[phab:T262901|T262901]]
* 15:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:59 marostegui: Update rack location on zarcillo for db1131 [[phab:T262901|T262901]]
* 15:01 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:57 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 100% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12638 and previous config saved to /var/cache/conftool/dbconfig/20200917-155708-kormat.json
* 15:01 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:44 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 75% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12637 and previous config saved to /var/cache/conftool/dbconfig/20200917-154431-kormat.json
* 15:01 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
* 15:43 mepps: updated payments-wiki from {{Gerrit|3c073a6a56}} to {{Gerrit|1e5a52ed26}}
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:35 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 50% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12636 and previous config saved to /var/cache/conftool/dbconfig/20200917-153514-kormat.json
* 14:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 15:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:20 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 25% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12635 and previous config saved to /var/cache/conftool/dbconfig/20200917-152019-kormat.json
* 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:17 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12634 and previous config saved to /var/cache/conftool/dbconfig/20200917-151347-marostegui.json
* 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 15:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 15:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12633 and previous config saved to /var/cache/conftool/dbconfig/20200917-150234-marostegui.json
* 14:42 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 15:02 jynus: deploying extended grants for admin account on sys/p_s at s8@codfw [[phab:T195578|T195578]]
* 14:39 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:00 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 15:00 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:54 kormat@cumin1001: dbctl commit (dc=all): 'db2114: depool for schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12632 and previous config saved to /var/cache/conftool/dbconfig/20200917-145451-kormat.json
* 14:17 godog: add 100G to prometheus/eqiad instance k8s-mlserve
* 14:49 cmjohnson1: ending pdu maintenance in eqiad
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 14:40 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12631 and previous config saved to /var/cache/conftool/dbconfig/20200917-143914-marostegui.json
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 14:32 papaul: replacing msw-d1,d2,d3,d4,d5 and d6
* 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:31 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 14:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12630 and previous config saved to /var/cache/conftool/dbconfig/20200917-141825-marostegui.json
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 14:02 marostegui: Start mysql on db1125 after PDU maintenance [[phab:T261459|T261459]]
* 13:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12629 and previous config saved to /var/cache/conftool/dbconfig/20200917-140014-marostegui.json
* 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 13:33 jayme: ran ipvsadm -D -t 10.2.2.14:8888 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34855 and previous config saved to /var/cache/conftool/dbconfig/20220916-131902-root.json
* 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34854 and previous config saved to /var/cache/conftool/dbconfig/20220916-130357-root.json
* 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34853 and previous config saved to /var/cache/conftool/dbconfig/20220916-125841-root.json
* 13:32 jayme: ran ipvsadm -D -t 10.2.2.31:8748 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34852 and previous config saved to /var/cache/conftool/dbconfig/20220916-124850-root.json
* 13:32 jayme: ran ipvsadm -D -t 10.2.1.31:8748 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34851 and previous config saved to /var/cache/conftool/dbconfig/20220916-124336-root.json
* 13:32 jayme: ran ipvsadm -D -t 10.2.1.14:8888 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
* 12:43 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
* 13:25 kormat@cumin1001: dbctl commit (dc=all): 'Start depooling db2114 [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12628 and previous config saved to /var/cache/conftool/dbconfig/20200917-132513-kormat.json
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34850 and previous config saved to /var/cache/conftool/dbconfig/20220916-123346-root.json
* 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34849 and previous config saved to /var/cache/conftool/dbconfig/20220916-122831-root.json
* 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34848 and previous config saved to /var/cache/conftool/dbconfig/20220916-121841-root.json
* 13:19 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34847 and previous config saved to /var/cache/conftool/dbconfig/20220916-121326-root.json
* 13:17 marostegui: Stop MySQL on db2125 for on-site maintenance [[phab:T260670|T260670]]
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34846 and previous config saved to /var/cache/conftool/dbconfig/20220916-120336-root.json
* 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34845 and previous config saved to /var/cache/conftool/dbconfig/20220916-115821-root.json
* 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34844 and previous config saved to /var/cache/conftool/dbconfig/20220916-114935-root.json
* 13:13 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet
* 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34843 and previous config saved to /var/cache/conftool/dbconfig/20220916-114831-root.json
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.9
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34842 and previous config saved to /var/cache/conftool/dbconfig/20220916-114316-root.json
* 12:18 cmjohnson1: pdu swap maintenance beginning now for racks D1, D2 and C1 eqiad
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P34841 and previous config saved to /var/cache/conftool/dbconfig/20220916-113543-root.json
* 11:24 matthiasmullie: End Euro B&C
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34840 and previous config saved to /var/cache/conftool/dbconfig/20220916-113431-root.json
* 11:24 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/NavigationTiming/: Account for empty layout shift sources array (duration: 01m 05s)
* 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34839 and previous config saved to /var/cache/conftool/dbconfig/20220916-113325-root.json
* 11:22 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/WikimediaEvents/: Disable MediaSearch A/B test (duration: 01m 08s)
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P34838 and previous config saved to /var/cache/conftool/dbconfig/20220916-112750-root.json
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12627 and previous config saved to /var/cache/conftool/dbconfig/20200917-111028-marostegui.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34837 and previous config saved to /var/cache/conftool/dbconfig/20220916-111925-root.json
* 11:06 vgutierrez: update to acme-chief 0.29 on acmechief[12]001 - [[phab:T263006|T263006]]
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34836 and previous config saved to /var/cache/conftool/dbconfig/20220916-110420-root.json
* 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34835 and previous config saved to /var/cache/conftool/dbconfig/20220916-105819-ladsgroup.json
* 11:04 vgutierrez: upload acme-chief 0.29 to apt.wm.o (buster) - [[phab:T263006|T263006]]
* 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 11:03 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=eqiad
* 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34834 and previous config saved to /var/cache/conftool/dbconfig/20220916-105809-ladsgroup.json
* 10:58 marostegui: Stop mysql on db1125 for PDU mainteanance, lag will appear on s2, s4, s6 and s7 on labsdb hosts [[phab:T261459|T261459]]
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34832 and previous config saved to /var/cache/conftool/dbconfig/20220916-104916-root.json
* 10:58 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=codfw
* 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34831 and previous config saved to /var/cache/conftool/dbconfig/20220916-104303-ladsgroup.json
* 10:51 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=codfw
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34830 and previous config saved to /var/cache/conftool/dbconfig/20220916-103411-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12626 and previous config saved to /var/cache/conftool/dbconfig/20200917-104816-marostegui.json
* 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34829 and previous config saved to /var/cache/conftool/dbconfig/20220916-102756-ladsgroup.json
* 10:46 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34828 and previous config saved to /var/cache/conftool/dbconfig/20220916-101905-root.json
* 10:40 oblivian@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=wikifeeds
* 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34827 and previous config saved to /var/cache/conftool/dbconfig/20220916-101250-ladsgroup.json
* 10:34 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34826 and previous config saved to /var/cache/conftool/dbconfig/20220916-100400-root.json
* 10:27 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34825 and previous config saved to /var/cache/conftool/dbconfig/20220916-093635-root.json
* 10:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34824 and previous config saved to /var/cache/conftool/dbconfig/20220916-093121-root.json
* 10:20 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34823 and previous config saved to /var/cache/conftool/dbconfig/20220916-092130-root.json
* 10:18 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34822 and previous config saved to /var/cache/conftool/dbconfig/20220916-091616-root.json
* 10:17 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34821 and previous config saved to /var/cache/conftool/dbconfig/20220916-091234-ladsgroup.json
* 09:14 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 08:49 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1002 - [[phab:T262527|T262527]]
* 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 08:43 jayme: uncordoned kubestage1002 after kernel upgrade - [[phab:T262527|T262527]]
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34820 and previous config saved to /var/cache/conftool/dbconfig/20220916-090625-root.json
* 08:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34819 and previous config saved to /var/cache/conftool/dbconfig/20220916-090111-root.json
* 08:37 godog: graphite compress /var/log/carbon logs older than 2d
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34818 and previous config saved to /var/cache/conftool/dbconfig/20220916-085120-root.json
* 08:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34817 and previous config saved to /var/cache/conftool/dbconfig/20220916-084607-root.json
* 08:25 jayme: reboot kubestage1002 for kernel upgrade - [[phab:T262527|T262527]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34816 and previous config saved to /var/cache/conftool/dbconfig/20220916-083615-root.json
* 08:24 godog: graphite add 300G to /srv
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34815 and previous config saved to /var/cache/conftool/dbconfig/20220916-083102-root.json
* 07:55 jayme: draining kubestage1002 for kernel upgrade - [[phab:T262527|T262527]]
* 08:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:55 jayme: cordoning kubestage1002 for kernel upgrade - [[phab:T262527|T262527]]
* 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12624 and previous config saved to /var/cache/conftool/dbconfig/20200917-070145-marostegui.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34814 and previous config saved to /var/cache/conftool/dbconfig/20220916-082110-root.json
* 06:55 hashar: Taking a heap dump of Gerrit JVM
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34813 and previous config saved to /var/cache/conftool/dbconfig/20220916-081557-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12623 and previous config saved to /var/cache/conftool/dbconfig/20200917-061931-marostegui.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34812 and previous config saved to /var/cache/conftool/dbconfig/20220916-080605-root.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12622 and previous config saved to /var/cache/conftool/dbconfig/20200917-060312-marostegui.json
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34811 and previous config saved to /var/cache/conftool/dbconfig/20220916-080052-root.json
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12621 and previous config saved to /var/cache/conftool/dbconfig/20200917-055219-marostegui.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P34810 and previous config saved to /var/cache/conftool/dbconfig/20220916-075100-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for on-site maintenace', diff saved to https://phabricator.wikimedia.org/P12620 and previous config saved to /var/cache/conftool/dbconfig/20200917-055158-marostegui.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P34809 and previous config saved to /var/cache/conftool/dbconfig/20220916-074548-root.json
* 05:46 marostegui: Stop mysql on db1131 - [[phab:T262901|T262901]]
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34808 and previous config saved to /var/cache/conftool/dbconfig/20220916-074251-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2031 on es2 for the first time with minimal weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12619 and previous config saved to /var/cache/conftool/dbconfig/20200917-054226-marostegui.json
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P34807 and previous config saved to /var/cache/conftool/dbconfig/20220916-072958-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12618 and previous config saved to /var/cache/conftool/dbconfig/20200917-053503-marostegui.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34806 and previous config saved to /var/cache/conftool/dbconfig/20220916-072746-root.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12617 and previous config saved to /var/cache/conftool/dbconfig/20200917-052347-marostegui.json
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2011 as es1 master and es2017 as es3 master and then depool es2018 and es2012 to clone es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12616 and previous config saved to /var/cache/conftool/dbconfig/20200917-051741-marostegui.json
* 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12615 and previous config saved to /var/cache/conftool/dbconfig/20200917-050739-marostegui.json
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:53 marostegui: Deploy schema change on s1 eqiad primary master - [[phab:T238966|T238966]]
* 07:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:22 Krinkle: krinkle@mwmaint1002 synced docroot/noc – https://gerrit.wikimedia.org/r/620138
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34805 and previous config saved to /var/cache/conftool/dbconfig/20220916-071241-root.json
* 01:22 Krinkle: krinkle@mwmaint2001 synced docroot/noc – https://gerrit.wikimedia.org/r/620138
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34804 and previous config saved to /var/cache/conftool/dbconfig/20220916-065737-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34803 and previous config saved to /var/cache/conftool/dbconfig/20220916-064232-root.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34802 and previous config saved to /var/cache/conftool/dbconfig/20220916-062727-root.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34801 and previous config saved to /var/cache/conftool/dbconfig/20220916-061222-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34800 and previous config saved to /var/cache/conftool/dbconfig/20220916-055717-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34799 and previous config saved to /var/cache/conftool/dbconfig/20220916-055542-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P34798 and previous config saved to /var/cache/conftool/dbconfig/20220916-055424-root.json
* 05:51 marostegui: Install 10.6 on db1168 [[phab:T301879|T301879]]
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P34797 and previous config saved to /var/cache/conftool/dbconfig/20220916-055031-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P34795 and previous config saved to /var/cache/conftool/dbconfig/20220916-054438-root.json
* 01:57 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 09s)
* 01:57 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 01:54 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 10s)
* 01:54 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 00:14 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 17s)
* 00:14 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)


== 2020-09-16 ==
== 2022-09-15 ==
* 23:41 catrope@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs: [[phab:T262970|T262970]] (duration: 01m 06s)
* 23:51 mutante: gerrit1001 - disabled puppet - gerrit:832411
* 23:40 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs: [[phab:T262970|T262970]] (duration: 01m 06s)
* 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 23:37 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/GrowthExperiments/: Fix styling for mobile start module ([[phab:T258008|T258008]]); Revert wider task card on desktop ([[phab:T263042|T263042]], [[phab:T258704|T258704]]); Fix width of sidebar modules in narrow mode in variant A ([[phab:T263068|T263068]]) (duration: 01m 09s)
* 22:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs2001.codfw.wmnet with reason: [[phab:T316236|T316236]]
* 22:24 shdubsh: install prometheus-icinga-exporter 0.11 on icinga2001
* 21:30 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:25 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] (duration: 07m 06s)
* 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
* 20:18 thcipriani@deploy1002: thcipriani and dani: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 20:10 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:18 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832526{{!}}Increase coverage of Research Incentive Survey on idwiki (T316466)]]
* 20:04 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:15 thcipriani@deploy1002: Finished scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] (duration: 07m 39s)
* 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Vector search in header on testwiki and officewiki ([[phab:T262207|T262207]]) (duration: 01m 04s)
* 20:08 thcipriani@deploy1002: thcipriani and dcausse: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 18:00 brennen@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend: Backport: [[gerrit:627793{{!}}Check $coords matched some nodes before comparing contents (T263034)]] (duration: 01m 06s)
* 20:07 thcipriani@deploy1002: Started scap: Backport for [[gerrit:832323{{!}}Revert "cirrus: Handle transition to elasticsearch 7.10" (T308676)]]
* 17:58 joal@deploy1001: Finished deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0] (duration: 00m 08s)
* 19:26 ebernhardson: pool'd wdqs2001, some blockers before reload can start [[phab:T316236|T316236]]
* 17:58 joal@deploy1001: Started deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0]
* 18:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 17:51 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:39 dancy@deploy1002: Finished scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] (duration: 09m 53s)
* 17:50 joal@deploy1001: Started deploy [analytics/refinery@07056b0]: Regular analytics weekly train [analytics/refinery@07056b0]
* 18:38 cwhite: restart thanos-compact (thanos-fe2001) and swift_ring_manager (thanos-fe1001)
* 17:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:29 dancy@deploy1002: dancy and cscott: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 17:11 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:29 dancy@deploy1002: Started scap: Backport for [[gerrit:832547{{!}}Use more permissive match for TOC_PLACEHOLDER in parser output (T317857)]]
* 17:03 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2003.codfw.wmnet on all recursors
* 16:45 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2003.codfw.wmnet on all recursors
* 16:40 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 16:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:17 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 16:13 marostegui: Start mysql on db1093, db1109 and db1123 after pdu work is done
* 18:17 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2001.codfw.wmnet on all recursors
* 16:12 ryankemper: `wdqs` deploy complete, service is healthy
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2001.codfw.wmnet on all recursors
* 16:09 elukey: reinstall buster on an-tool1009 after a lot of tests (ganeti VM, so it is a manual work)
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1003.eqiad.wmnet on all recursors
* 16:00 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1003.eqiad.wmnet on all recursors
* 15:58 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1002.eqiad.wmnet on all recursors
* 15:49 ryankemper: sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'; sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1002.eqiad.wmnet on all recursors
* 15:49 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 18:16 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe1001.eqiad.wmnet on all recursors
* 15:48 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b7e2d0b]: 0.3.48 (duration: 14m 40s)
* 18:16 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe1001.eqiad.wmnet on all recursors
* 15:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:627871{{!}}Rename wmgWikibaseClientLocalEntitySourceName to wmgWikibaseClientItemAndPropertySourceName on Beta (T258060)]] (production no-op) (duration: 01m 04s)
* 18:15 ebernhardson: depool wcqs2001 for [[phab:T316236|T316236]]
* 15:35 ryankemper: Canary `wdqs1003` query tests looks good, proceeding to wdqs deploy for rest of fleet
* 18:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:33 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b7e2d0b]: 0.3.48
* 18:13 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 15:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:622994{{!}}Remove `wmgWikibaseClientLocalEntitySourceName` from InitialiseSettings.php (T258060)]] (duration: 01m 05s)
* 18:07 godog: restart envoyproxy on thanos-fe*
* 15:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:622993{{!}}Use `wmgWikibaseClientItemAndPropertySourceName` instead of `wmgWikibaseClientLocalEntitySourceName` in Wikibase.php (T258060)]] (duration: 01m 02s)
* 18:06 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) thanos-fe2002.codfw.wmnet on all recursors
* 15:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:622612{{!}}Add `wmgWikibaseClientItemAndPropertySourceName` to InitialiseSettings.php (T258060)]] (duration: 01m 06s)
* 18:06 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache thanos-fe2002.codfw.wmnet on all recursors
* 14:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:41 volans: uploaded spicerack_0.0.43 to apt.wikimedia.org buster-wikimedia
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:39 cmjohnson1: pdu swap rack d7-eqiad, missed this in earlier log entry
* 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging BryanDavis out of all services on: 2047 hosts
* 14:34 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 16:16 andrew@cumin1001: START - Cookbook sre.idm.logout Logging BryanDavis out of all services on: 2047 hosts
* 14:02 Urbanecm: Change email address of User:Oversight@enwiki to oversight-en-wp@wikipedia.org as OTRS is back up ([[phab:T262733|T262733]])
* 15:39 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:48 marostegui: Start mysql on db1121 after PDU work
* 15:37 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 13:46 James_F: Restarting CI Jenkins for [[phab:T262827|T262827]]
* 15:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 13:08 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2256.codfw.wmnet
* 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
* 13:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.9
* 15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
* 12:58 elukey: upload hue_4.7.1-1+deb10u1 to buster-wikimedia
* 15:22 hnowlan: starting cassandra on sessionstore1001-a
* 12:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:18 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
* 12:56 cdanis@cumin1001: START - Cookbook sre.network.cf
* 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34792 and previous config saved to /var/cache/conftool/dbconfig/20220915-151131-ladsgroup.json
* 12:49 cmjohnson1: start pdu swap in racks c6 and c7, d8
* 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34791 and previous config saved to /var/cache/conftool/dbconfig/20220915-145625-ladsgroup.json
* 12:36 moritzm: powercycling mw2256 (went down with overheated CPU)
* 14:41 moritzm: installing libtirpc security updates
* 12:29 moritzm: restarting exim on MXes to pick up GNUTLS update
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P34790 and previous config saved to /var/cache/conftool/dbconfig/20220915-144118-ladsgroup.json
* 11:29 moritzm: restarting slapd on LDAP replicas to pick up GNUTLS update
* 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34789 and previous config saved to /var/cache/conftool/dbconfig/20220915-142612-ladsgroup.json
* 11:18 moritzm: installing gnutls28 security updates on remaining stretch hosts
* 14:01 sukhe: retarting bird.service on A:dns-auth for zlib update
* 11:12 jforrester@deploy1001: Synchronized php-1.36.0-wmf.9/includes/filerepo/file: [[phab:T263014|T263014]] Revert "Remove support for (Archived{{!}}OldLocal)File::userCan without a user" (duration: 01m 04s)
* 14:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6b9784a0708cf1e7762034ccfba7e5604b2f6dc2}}: Enable the Vue version of the mentee overview in pilot wikis ([[phab:T300532|T300532]]) (duration: 03m 45s)
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2027 and es2028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12606 and previous config saved to /var/cache/conftool/dbconfig/20200916-103324-marostegui.json
* 13:58 aqu@deploy1002: Finished deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d] (duration: 00m 09s)
* 10:20 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.9
* 13:58 aqu@deploy1002: Started deploy [airflow-dags/analytics@b9be20d]: Regular analytics weekly train [airflow-dags@b9be20d]
* 10:14 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.9 (duration: 46m 07s)
* 13:57 sukhe: retarting haproxy.service on A:dns-auth for zlib update
* 10:10 ema: upload python-acme 0.31.0-2wm1 to buster-wikimedia [[phab:T263006|T263006]]
* 13:57 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d] (duration: 00m 10s)
* 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12605 and previous config saved to /var/cache/conftool/dbconfig/20200916-100548-marostegui.json
* 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@b9be20d]: Regular analytics weekly train TEST [airflow-dags@b9be20d]
* 10:01 akosiaris: [[phab:T187984|T187984]] Shutdown mendelevium.
* 13:51 jayme: updated rsyslog to 8.2208.0-1~bpo11+1 on all kubernetes masters and nodes - [[phab:T289766|T289766]]
* 09:43 jynus: deploying max_packet_size change to m3 instances, too
* 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.9
* 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383] (duration: 06m 01s)
* 09:26 liw: moving train 1.36.0-wmf.9 to testwikis
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:22 jynus: restarting gerrit service on gerrit1001, unresponsive
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12603 and previous config saved to /var/cache/conftool/dbconfig/20200916-091535-marostegui.json
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:13 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 0 - [[phab:T262290|T262290]]
* 13:41 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST (second try after freeing up some disk space) [analytics/refinery@278c383]
* 09:08 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 1 - [[phab:T262290|T262290]]
* 13:38 sukhe: restarting bird.service on A:dns-rec for zlib update
* 08:52 marostegui: Stop mysql on db1121, db1123, db1093 and db1109 for PDU work [[phab:T261454|T261454]] [[phab:T261457|T261457]]
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:52 XioNoX: asw-d-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:50 jynus: deploy new max_allowed_packet configuration to m1, m2 and m5 dbs
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12601 and previous config saved to /var/cache/conftool/dbconfig/20200916-084916-marostegui.json
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:42 awight: finished security backport for https://phabricator.wikimedia.org/T262628
* 13:33 sukhe: restarting pdns-recursor on A:dns-rec for zlib update
* 08:41 awight@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FileImporter/src/Services/ImportPlanValidator.php: Security patch for [[phab:T262628|T262628]] (duration: 00m 59s)
* 13:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.28/extensions/GrowthExperiments/: {{Gerrit|f592e85858d17a2de99cde93627054ee4972c2bd}}: Mentee overview: avoid requiring the non-vue mentee overview script when loading the Vue one ([[phab:T300532|T300532]]) (duration: 04m 05s)
* 08:41 XioNoX: asw-c-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:27 XioNoX: asw-b-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 12:50 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:26 awight: beginning security backport for https://phabricator.wikimedia.org/T262628
* 12:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 08:17 XioNoX: asw-a-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 12:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: temporarily disabled due to sessionstore issues
* 08:04 akosiaris: [[phab:T187984|T187984]] Validated that ticket.wikimedia.org works, proceeding with a wider announcement
* 12:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1001.eqiad.wmnet with OS buster
* 08:02 XioNoX: asw2-d-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 12:17 jayme: fleet wide update of prometheus-rsyslog-exporter to 0.0.0+git20201008-4 - [[phab:T289766|T289766]]
* 07:49 akosiaris: [[phab:T187984|T187984]] Switch over ticket.discovery.wmnet to otrs1001
* 12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 07:48 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
* 07:44 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34787 and previous config saved to /var/cache/conftool/dbconfig/20220915-120013-root.json
* 07:40 XioNoX: asw2-c-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 11:51 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 07:37 akosiaris: [[phab:T187984|T187984]] Tested inbound email successfully
* 11:50 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 07:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 07:26 akosiaris: [[phab:T187984|T187984]] Tested outbound email, switching inbound email configuration and performing tests
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34786 and previous config saved to /var/cache/conftool/dbconfig/20220915-114508-root.json
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12600 and previous config saved to /var/cache/conftool/dbconfig/20200916-072614-marostegui.json
* 11:44 hnowlan@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1001.eqiad.wmnet with OS buster
* 07:22 jayme@cumin1001: START - Cookbook sre.hosts.decommission
* 11:43 moritzm: restart exim on lists1001 to pick up zlib security updates
* 07:22 jayme@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34785 and previous config saved to /var/cache/conftool/dbconfig/20220915-113003-root.json
* 07:21 jayme@cumin1001: START - Cookbook sre.hosts.decommission
* 11:22 jayme: importing prometheus-rsyslog-exporter 0.0.0+git20201008-4 to stretch-wikimedia, buster-wikimedia, bullseye-wikimedia - [[phab:T289766|T289766]]
* 07:12 akosiaris: [[phab:T187984|T187984]] Disable gravatar in system configuration to avoid leaking agent PII through a 3rd party service
* 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS buster
* 07:03 akosiaris: [[phab:T187984|T187984]] validated that the OTRS installation is functional over SSH
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 07:02 akosiaris: [[phab:T187984|T187984]] migration script done. Config updates, rebuilds, package upgrades/reinstall and index rebuilds done
* 11:15 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 06:28 godog: codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34784 and previous config saved to /var/cache/conftool/dbconfig/20220915-111458-root.json
* 06:20 kart_: Updated cxserver to 2020-08-30-011854-production ([[phab:T253439|T253439]], [[phab:T260557|T260557]])
* 11:12 hnowlan: sessionstore1001: c-foreach-nt drain
* 06:20 XioNoX: asw2-b-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 11:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 06:15 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1001.eqiad.wmnet with reason: Testing reimage
* 06:11 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'pool db2129 into s6 API', diff saved to https://phabricator.wikimedia.org/P34783 and previous config saved to /var/cache/conftool/dbconfig/20220915-110453-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 for the first time with minimum weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12599 and previous config saved to /var/cache/conftool/dbconfig/20200916-061013-marostegui.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34782 and previous config saved to /var/cache/conftool/dbconfig/20220915-105953-root.json
* 06:08 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34781 and previous config saved to /var/cache/conftool/dbconfig/20220915-104448-root.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12598 and previous config saved to /var/cache/conftool/dbconfig/20200916-060717-marostegui.json
* 10:36 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 to clone es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12597 and previous config saved to /var/cache/conftool/dbconfig/20200916-055535-marostegui.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34780 and previous config saved to /var/cache/conftool/dbconfig/20220915-102943-root.json
* 05:53 XioNoX: asw2-a-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34779 and previous config saved to /var/cache/conftool/dbconfig/20220915-101438-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12596 and previous config saved to /var/cache/conftool/dbconfig/20200916-055108-marostegui.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34778 and previous config saved to /var/cache/conftool/dbconfig/20220915-101425-root.json
* 05:50 XioNoX: msw1-codfw> request system snapshot slice alternate - [[phab:T262290|T262290]]
* 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2027 and es2028 to dbctl [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12595 and previous config saved to /var/cache/conftool/dbconfig/20200916-053918-marostegui.json
* 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db2131.codfw.wmnet with reason: reboot
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12594 and previous config saved to /var/cache/conftool/dbconfig/20200916-053507-marostegui.json
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P34777 and previous config saved to /var/cache/conftool/dbconfig/20220915-100212-root.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into vslow', diff saved to https://phabricator.wikimedia.org/P12593 and previous config saved to /var/cache/conftool/dbconfig/20200916-052343-marostegui.json
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34775 and previous config saved to /var/cache/conftool/dbconfig/20220915-095920-root.json
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12592 and previous config saved to /var/cache/conftool/dbconfig/20200916-052241-marostegui.json
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 05:07 marostegui: Repool labsdb1010
* 09:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 02:22 mutante: deneb - sudo systemctl start package_builder_Clean_up_build_directory to fix icinga alert after failed build attempts
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 09:57 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 09:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34774 and previous config saved to /var/cache/conftool/dbconfig/20220915-094415-root.json
* 09:38 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383] (duration: 14m 21s)
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34773 and previous config saved to /var/cache/conftool/dbconfig/20220915-092910-root.json
* 09:23 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@278c383]
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34772 and previous config saved to /var/cache/conftool/dbconfig/20220915-091405-root.json
* 09:13 aqu@deploy1002: Finished deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383] (duration: 00m 08s)
* 09:13 aqu@deploy1002: Started deploy [analytics/refinery@278c383] (thin): Regular analytics weekly train THIN [analytics/refinery@278c383]
* 09:12 aqu@deploy1002: Finished deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383] (duration: 27m 31s)
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34770 and previous config saved to /var/cache/conftool/dbconfig/20220915-085900-root.json
* 08:49 apergos: UTC backport training window closed at lsat
* 08:46 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 08:45 aqu@deploy1002: Started deploy [analytics/refinery@278c383]: Regular analytics weekly train [analytics/refinery@278c383]
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34768 and previous config saved to /var/cache/conftool/dbconfig/20220915-084355-root.json
* 08:43 aqu: about to deploy analytics/refinery
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34767 and previous config saved to /var/cache/conftool/dbconfig/20220915-084046-root.json
* 08:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34766 and previous config saved to /var/cache/conftool/dbconfig/20220915-082851-root.json
* 08:26 tsepothoabala@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832339{{!}}Enable action blocks on ptwiki (T317157)]] (duration: 04m 07s)
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34765 and previous config saved to /var/cache/conftool/dbconfig/20220915-082541-root.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34764 and previous config saved to /var/cache/conftool/dbconfig/20220915-082112-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34763 and previous config saved to /var/cache/conftool/dbconfig/20220915-081627-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2114 to s6 codfw master [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34762 and previous config saved to /var/cache/conftool/dbconfig/20220915-081517-marostegui.json
* 08:14 marostegui: Starting s6 codfw failover from db2129 to db2114 - [[phab:T317850|T317850]]
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34761 and previous config saved to /var/cache/conftool/dbconfig/20220915-081036-root.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34760 and previous config saved to /var/cache/conftool/dbconfig/20220915-080607-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 from API [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34759 and previous config saved to /var/cache/conftool/dbconfig/20220915-080157-root.json
* 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary codfw s6 [[phab:T317850|T317850]]
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2114 with weight 0 [[phab:T317850|T317850]]', diff saved to https://phabricator.wikimedia.org/P34758 and previous config saved to /var/cache/conftool/dbconfig/20220915-080122-root.json
* 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary codfw s6 [[phab:T317850|T317850]]
* 07:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34757 and previous config saved to /var/cache/conftool/dbconfig/20220915-075531-root.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34756 and previous config saved to /var/cache/conftool/dbconfig/20220915-075102-root.json
* 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
* 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
* 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2133,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2134,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on db[2135,2160].codfw.wmnet with reason: reboot
* 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34755 and previous config saved to /var/cache/conftool/dbconfig/20220915-074026-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34754 and previous config saved to /var/cache/conftool/dbconfig/20220915-073557-root.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34753 and previous config saved to /var/cache/conftool/dbconfig/20220915-072520-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34752 and previous config saved to /var/cache/conftool/dbconfig/20220915-072053-root.json
* 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: reboot
* 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2151.codfw.wmnet with reason: reboot
* 07:14 moritzm: installing zlib security updates
* 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34751 and previous config saved to /var/cache/conftool/dbconfig/20220915-071015-root.json
* 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
* 07:06 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34750 and previous config saved to /var/cache/conftool/dbconfig/20220915-070548-root.json
* 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2115 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P34749 and previous config saved to /var/cache/conftool/dbconfig/20220915-065510-root.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34748 and previous config saved to /var/cache/conftool/dbconfig/20220915-065043-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db2096 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34747 and previous config saved to /var/cache/conftool/dbconfig/20220915-064750-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2115 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34746 and previous config saved to /var/cache/conftool/dbconfig/20220915-064635-marostegui.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2096 to x1 primary and set section read-write [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34745 and previous config saved to /var/cache/conftool/dbconfig/20220915-064525-root.json
* 06:44 marostegui: Starting x1 codfw failover from db2115 to db2096 - [[phab:T317842|T317842]]
* 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T317842|T317842]]
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2096 with weight 0 [[phab:T317842|T317842]]', diff saved to https://phabricator.wikimedia.org/P34744 and previous config saved to /var/cache/conftool/dbconfig/20220915-064014-root.json
* 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 [[phab:T317842|T317842]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34743 and previous config saved to /var/cache/conftool/dbconfig/20220915-063538-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34742 and previous config saved to /var/cache/conftool/dbconfig/20220915-061421-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 codfw [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34741 and previous config saved to /var/cache/conftool/dbconfig/20220915-061317-marostegui.json
* 06:12 marostegui: Starting s3 codfw failover from db2105 to db2127 - [[phab:T317839|T317839]]
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 [[phab:T317839|T317839]]', diff saved to https://phabricator.wikimedia.org/P34740 and previous config saved to /var/cache/conftool/dbconfig/20220915-060307-root.json
* 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Codfw switchover s3 [[phab:T317839|T317839]]
* 06:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Codfw switchover s3 [[phab:T317839|T317839]]
* 05:32 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]
* 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1189.eqiad.wmnet with reason: down [[phab:T317662|T317662]]


== 2020-09-15 ==
== 2022-09-14 ==
* 23:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: {{Gerrit|1c0b0d161fe1024d6d08a27bbacf5b62c56c9c01}}: Fix APCOND_FR_NEVERBLOCKED handling ([[phab:T262970|T262970]]) (duration: 00m 56s)
* 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34739 and previous config saved to /var/cache/conftool/dbconfig/20220914-220822-ladsgroup.json
* 23:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: {{Gerrit|5beace32a396adfcce46b04e7f969b2f9f9effda}}: Fix APCOND_FR_NEVERBLOCKED handling ([[phab:T262970|T262970]]) (duration: 00m 58s)
* 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
* 23:14 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|ac8bd3894f2dc8f2735cc9fa7b860af1d91c6707}}: flaggedrevs: Remove non-existent config options (duration: 00m 58s)
* 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
* 23:07 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 23:00 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|62b21d55a8f0a94b8cd268d5024df0cf64013dd5}}: Revert "Remove abusefilter-view right grant from wmf-config" ([[phab:T255506|T255506]]) (duration: 00m 59s)
* 22:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
* 20:44 brennen: removing extraneous recursive symlink /srv/mediawiki-staging/php-1.36.0-wmf.9/php-1.36.0-wmf.8
* 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34738 and previous config saved to /var/cache/conftool/dbconfig/20220914-220744-ladsgroup.json
* 18:32 Urbanecm: Morning B&C done
* 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34737 and previous config saved to /var/cache/conftool/dbconfig/20220914-215238-ladsgroup.json
* 18:28 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|084729b7fd0716f11265f1b37570afc120b27109}}: Remove abusefilter-view right grant from wmf-config ([[phab:T255506|T255506]]) (duration: 00m 56s)
* 21:38 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 01m 48s)
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1d3456570b80b1d8af1d2b71975496e54f87b24e}}: Enable MediaWiki client errors on frwiki ([[phab:T255585|T255585]]) (duration: 00m 57s)
* 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P34736 and previous config saved to /var/cache/conftool/dbconfig/20220914-213732-ladsgroup.json
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|79004b7e503c7274fa56d2699b423b6919fbc869}}: Enable the reverted tag on all wikis ([[phab:T164307|T164307]]) (duration: 00m 56s)
* 21:37 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
* 17:59 krinkle@deploy1001: Synchronized src/ServiceConfig.php: {{Gerrit|If727ae4335}} (duration: 00m 56s)
* 21:36 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
* 17:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out (duration: 37m 42s)
* 21:35 dduvall@deploy1002: Installation of scap version "4.19.1" completed for 561 hosts
* 17:05 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out
* 21:35 dduvall@deploy1002: Installing scap version "4.19.1" for 561 hosts
* 17:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint (duration: 86m 46s)
* 21:34 dduvall: Deploying scap 4.19.1 (https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/832297/1/changelog)
* 17:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34735 and previous config saved to /var/cache/conftool/dbconfig/20220914-212225-ladsgroup.json
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:38 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 20:44 dancy@deploy1002: Sync cancelled.
* 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:44 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 20:44 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 20:44 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:40 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:26 shdubsh: manual install prometheus-icinga-exporter upgrade on icinga2001
* 20:40 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:53 godog: switch grafana to eqiad - [[phab:T259143|T259143]]
* 20:39 dancy@deploy1002: Started scap: testing
* 14:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:38 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 05m 49s)
* 14:42 volans@cumin1001: START - Cookbook sre.dns.netbox
* 20:34 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:38 XioNoX: remove old SNMP community from all network devices
* 20:34 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:34 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - [[phab:T251609|T251609]] (duration: 00m 56s)
* 20:33 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:21 otto@deploy1001: sync-file aborted: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - [[phab:T251609|T251609]] (duration: 00m 06s)
* 20:32 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:28 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 20:24 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 20:24 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:21 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:19 dancy@deploy1002: deploy-promote aborted:  (duration: 08m 52s)
* 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 20:19 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 01m 24s)
* 13:18 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 20:18 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:14 cmjohnson1: beginning work inside racks c2, c3, c4 and c5 eqiad
* 20:18 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, s8, add db1092 temporarily', diff saved to https://phabricator.wikimedia.org/P12589 and previous config saved to /var/cache/conftool/dbconfig/20200915-121849-marostegui.json
* 20:14 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:18 jbond42: update libxml2 on stretch and jessie
* 20:13 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:08 jbond42: rolling restart of php7.2-fpm
* 20:13 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:05 elukey: roll restart cassandra on aqs* to pick up openjdk upgrades
* 20:12 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:05 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 20:09 dancy@deploy1002: Sync cancelled.
* 11:44 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|294931fc6eb9e365894ec0cf94c155d55ecae549}}: Revert "Disable DynamicPageList on ruwikinews" ([[phab:T262240|T262240]]; [[phab:T262391|T262391]]) (duration: 00m 58s)
* 20:09 dancy@deploy1002: dancy: testing synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:17 effie: roll out scap 3.15.0-1 to all - [[phab:T261234|T261234]]
* 20:09 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:12 XioNoX: mass update SCS SNMP community in LibreNMS - [[phab:T246890|T246890]]
* 20:06 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 20:02 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:56 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:02 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:54 XioNoX: mass update PDU SNMP community in LibreNMS - [[phab:T246890|T246890]]
* 20:02 dancy@deploy1002: Started scap: testing
* 10:48 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 20:01 TheresNoTime: Nothing to deploy in this UTC late backport window
* 10:36 moritzm: uploaded libxml2 2.9.1+dfsg1-5+deb8u8+wmf1 for jessie-wikimedia
* 19:57 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
* 10:33 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 19:57 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
* 10:22 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "testwikiswikis to 1.36.0-wmf.9"
* 19:55 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 12s)
* 10:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 19:55 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:22 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts [[phab:T261455|T261455]]
* 19:51 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 19:51 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 19:49 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 19:49 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 19:49 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 19:48 dancy@deploy1002: Started scap: testing
* 09:04 gehel: restart elasticsearch on elastic2029 (high GC
* 19:46 dancy@deploy1002: scap failed: CalledProcessError Command '['helmfile', '-e', 'eqiad', 'apply']' returned non-zero exit status 1. (duration: 07m 23s)
* 09:01 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 19:46 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 19:39 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 19:39 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:53 elukey: roll restart druid zookeeper clusters for openjdk upgrades
* 19:39 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 19:38 dancy@deploy1002: Started scap: testing
* 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 19:38 dancy@deploy1002: sync-world aborted: testing (duration: 13m 25s)
* 08:13 marostegui: Stop MySQL on labsdb1010 for PDU maintenance [[phab:T261456|T261456]]
* 19:35 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:05 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_498180604" --store-class=LCStoreCDB --threads=30 --lang en  --quiet' returned non-zero exit status 1 (duration: 11m 10s)
* 19:26 dancy: dancy@deploy1002 touch /var/lib/deploy-mwdebug/pause
* 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 19:24 dancy@deploy1002: Started scap: testing
* 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:01 akosiaris: [[phab:T187984|T187984]] migration script on otrs1001 proceeding as expected. Still in step 31/44, but that's what we saw in the test migration
* 19:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling (duration: 02m 03s)
* 07:54 liw@deploy1001: Started scap: testwikis to 1.36.0-wmf.9
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 godog: swift codfw add ms-be2057 at object weight 100 - [[phab:T261633|T261633]]
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:19 elukey: roll restart druid cluster to pick up openjdk updates
* 19:15 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@48e506e]: drop-snapshots: Remove directory handling
* 07:19 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 XioNoX: pre-configure SGIX port on cr2-eqsin
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:57 liw: 1.36.0-wmf.9 was branched at {{Gerrit|7269b6b57b6f79646b96ece818d2f2d38e0d2ea6}} for [[phab:T257977|T257977]]
* 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:08 marostegui: Stop mysql on es2011 to clone es2028
* 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 to clone es2028', diff saved to https://phabricator.wikimedia.org/P12585 and previous config saved to /var/cache/conftool/dbconfig/20200915-060623-marostegui.json
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2012 as es1 codfw master [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12584 and previous config saved to /var/cache/conftool/dbconfig/20200915-060508-marostegui.json
* 18:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 05:33 marostegui: Depool labsdb1010 for PDU maintenance
* 18:50 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki (duration: 02m 05s)
* 05:10 marostegui: Restart sanitarium hosts on eqiad and codfw [[phab:T262832|T262832]]
* 18:48 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e358893]: drop-snapshots: tables are partitioned by wiki
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:36 dancy@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]] (duration: 04m 41s)
* 18:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:31 dancy@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.1  refs [[phab:T314190|T314190]]
* 16:47 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
* 16:10 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:08 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 16:05 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
* 16:04 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 - volans@cumin1001
* 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2024.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2024.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2002.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2002.codfw.wmnet on all recursors
* 15:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1026.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1026.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1029.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1029.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1028.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1028.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash1027.eqiad.wmnet on all recursors
* 15:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash1027.eqiad.wmnet on all recursors
* 15:55 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2001.codfw.wmnet on all recursors
* 15:55 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2001.codfw.wmnet on all recursors
* 15:50 dduvall@deploy1002: Finished deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002 (duration: 00m 39s)
* 15:49 dduvall@deploy1002: Started deploy [phabricator/deployment@3137c92]: testing phabricator deployment to phab2002
* 15:48 dduvall: testing phabricator deployment to phab2002. should have no production impact (not serving traffic, no access to r/w db)
* 15:24 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:22 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 15:22 cwhite@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:17 cwhite@cumin2002: START - Cookbook sre.dns.netbox
* 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34732 and previous config saved to /var/cache/conftool/dbconfig/20220914-145956-ladsgroup.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34731 and previous config saved to /var/cache/conftool/dbconfig/20220914-144449-ladsgroup.json
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P34730 and previous config saved to /var/cache/conftool/dbconfig/20220914-142941-ladsgroup.json
* 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34729 and previous config saved to /var/cache/conftool/dbconfig/20220914-141434-ladsgroup.json
* 14:06 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.thumbor (exit_code=0) rolling restart_daemons on A:thumbor-codfw
* 13:59 jmm@cumin2002: START - Cookbook sre.misc-clusters.thumbor rolling restart_daemons on A:thumbor-codfw
* 13:48 moritzm: imported zlib 1:1.2.8.dfsg-5+deb9u1+wmf1 to apt.wikimedia.org
* 13:40 Lucas_WMDE: UTC afternoon backport+config window done
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bnwiktionary --fix # [[phab:T317745|T317745]] – dry run result: 6043 links to fix, 6043 were resolvable, 0 were deleted
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:831970{{!}}Move namespace in the Bengali Wiktionary: উইকিসরাস → পরিশিষ্ট and set wgNamespaceAliases for newly created namespaces (T317745)]] (duration: 03m 41s)
* 13:28 topranks: upgrading routinator on rpki2002
* 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:20 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:832145{{!}}Enable Content/Section translation on WPs with new MT support from Google (T313296)]] (duration: 03m 39s)
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:831872{{!}}Enable Section Translation in Odia Wikipedia (T313300)]] (duration: 03m 55s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:54 jayme: imported rsyslog 8.2208.0-1~bpo11+1 into bullseye-wikimedia component/rsyslog-k8s - [[phab:T289766|T289766]]
* 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
* 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34725 and previous config saved to /var/cache/conftool/dbconfig/20220914-115920-ladsgroup.json
* 11:49 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr2-eqdfw,cr2-eqdfw IPv6
* 11:49 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr2-eqdfw,cr2-eqdfw IPv6
* 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34723 and previous config saved to /var/cache/conftool/dbconfig/20220914-114413-ladsgroup.json
* 11:29 topranks: rebooting cr2-eqdfw to complete upgrade
* 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P34721 and previous config saved to /var/cache/conftool/dbconfig/20220914-112907-ladsgroup.json
* 11:14 topranks: Shutting down internet transit and peering on cr2-eqdfw in advance of upgrade reboot
* 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34719 and previous config saved to /var/cache/conftool/dbconfig/20220914-111400-ladsgroup.json
* 11:02 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
* 11:02 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
* 11:01 topranks: Prepping to upgrade JunOS on cr2-eqdfw.  Adjusting OSPF costs to force traffic via alternate POPs.
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34717 and previous config saved to /var/cache/conftool/dbconfig/20220914-103810-root.json
* 10:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:24 kharlan@deploy1002: Synchronized php-1.40.0-wmf.1/extensions/WikimediaEvents/includes/BlockMetrics/BlockMetricsHooks.php: Backport: [[gerrit:831969{{!}}BlockMetrics: Update to new event schema version (T306018)]] (duration: 03m 48s)
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34715 and previous config saved to /var/cache/conftool/dbconfig/20220914-102305-root.json
* 10:18 moritzm: import routinator 0.11.3-1bullseye  to thirdparty/routinator
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34714 and previous config saved to /var/cache/conftool/dbconfig/20220914-100800-root.json
* 10:00 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 09:59 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 09:58 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:57 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:57 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 09:53 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
* 09:53 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
* 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34713 and previous config saved to /var/cache/conftool/dbconfig/20220914-095255-root.json
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34712 and previous config saved to /var/cache/conftool/dbconfig/20220914-093750-root.json
* 09:27 moritzm: installing zlib/libxslt security updates on buster
* 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34711 and previous config saved to /var/cache/conftool/dbconfig/20220914-092620-ladsgroup.json
* 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
* 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34710 and previous config saved to /var/cache/conftool/dbconfig/20220914-092558-ladsgroup.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34709 and previous config saved to /var/cache/conftool/dbconfig/20220914-092245-root.json
* 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34708 and previous config saved to /var/cache/conftool/dbconfig/20220914-091052-ladsgroup.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 3%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34707 and previous config saved to /var/cache/conftool/dbconfig/20220914-090740-root.json
* 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
* 09:05 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
* 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
* 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P34706 and previous config saved to /var/cache/conftool/dbconfig/20220914-085545-ladsgroup.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34705 and previous config saved to /var/cache/conftool/dbconfig/20220914-085235-root.json
* 08:50 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
* 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-test
* 08:49 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-test
* 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34704 and previous config saved to /var/cache/conftool/dbconfig/20220914-084039-ladsgroup.json
* 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart on A:wdqs-test
* 08:38 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart on A:wdqs-test
* 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
* 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maint needed
* 08:32 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:832157{{!}}Stop writing to the old templatelinks columns of enwiki (T312865)]] (duration: 06m 51s)
* 08:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for [[gerrit:832157{{!}}Stop writing to the old templatelinks columns of enwiki (T312865)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:25 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:832157{{!}}Stop writing to the old templatelinks columns of enwiki (T312865)]]
* 08:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1024.eqiad.wmnet with reason: down
* 08:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es1024.eqiad.wmnet with reason: down
* 08:02 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es5 [[phab:T317739|T317739]] (duration: 03m 38s)
* 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 [[phab:T317739|T317739]]', diff saved to https://phabricator.wikimedia.org/P34703 and previous config saved to /var/cache/conftool/dbconfig/20220914-075722-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary [[phab:T317739|T317739]]', diff saved to https://phabricator.wikimedia.org/P34702 and previous config saved to /var/cache/conftool/dbconfig/20220914-075550-marostegui.json
* 07:55 marostegui: Starting es5 eqiad failover from es1024 to es1023 [[phab:T317739|T317739]]
* 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:50 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es5 [[phab:T317739|T317739]] (duration: 04m 13s)
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1023 with weight 0 [[phab:T317739|T317739]]', diff saved to https://phabricator.wikimedia.org/P34701 and previous config saved to /var/cache/conftool/dbconfig/20220914-074617-marostegui.json
* 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T317739|T317739]]
* 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 [[phab:T317739|T317739]]
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34700 and previous config saved to /var/cache/conftool/dbconfig/20220914-074248-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34699 and previous config saved to /var/cache/conftool/dbconfig/20220914-072743-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34698 and previous config saved to /var/cache/conftool/dbconfig/20220914-071238-root.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34697 and previous config saved to /var/cache/conftool/dbconfig/20220914-065733-root.json
* 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34696 and previous config saved to /var/cache/conftool/dbconfig/20220914-064330-ladsgroup.json
* 06:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34695 and previous config saved to /var/cache/conftool/dbconfig/20220914-064309-ladsgroup.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34694 and previous config saved to /var/cache/conftool/dbconfig/20220914-064228-root.json
* 06:38 elukey: restart kafka on kafka-logging2003 to pick up  the new PKI TLS settings
* 06:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
* 06:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging2003.codfw.wmnet with reason: Kafka PKI upgrade
* 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34693 and previous config saved to /var/cache/conftool/dbconfig/20220914-062802-ladsgroup.json
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: Repooling for warm up after upgrade', diff saved to https://phabricator.wikimedia.org/P34692 and previous config saved to /var/cache/conftool/dbconfig/20220914-062723-root.json
* 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P34691 and previous config saved to /var/cache/conftool/dbconfig/20220914-061256-ladsgroup.json
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: down
* 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: down
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 [[phab:T317735|T317735]]', diff saved to https://phabricator.wikimedia.org/P34690 and previous config saved to /var/cache/conftool/dbconfig/20220914-060913-root.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 codfw primary [[phab:T317735|T317735]]', diff saved to https://phabricator.wikimedia.org/P34689 and previous config saved to /var/cache/conftool/dbconfig/20220914-060807-marostegui.json
* 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34688 and previous config saved to /var/cache/conftool/dbconfig/20220914-055749-ladsgroup.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 [[phab:T317735|T317735]]', diff saved to https://phabricator.wikimedia.org/P34687 and previous config saved to /var/cache/conftool/dbconfig/20220914-055156-marostegui.json
* 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T317735|T317735]]
* 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 [[phab:T317735|T317735]]
* 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34686 and previous config saved to /var/cache/conftool/dbconfig/20220914-052510-ladsgroup.json
* 05:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 05:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
* 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34685 and previous config saved to /var/cache/conftool/dbconfig/20220914-052448-ladsgroup.json
* 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34684 and previous config saved to /var/cache/conftool/dbconfig/20220914-050942-ladsgroup.json
* 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P34683 and previous config saved to /var/cache/conftool/dbconfig/20220914-045435-ladsgroup.json
* 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34682 and previous config saved to /var/cache/conftool/dbconfig/20220914-043929-ladsgroup.json
* 03:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34681 and previous config saved to /var/cache/conftool/dbconfig/20220914-035624-ladsgroup.json
* 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
* 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 03:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
* 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34680 and previous config saved to /var/cache/conftool/dbconfig/20220914-035546-ladsgroup.json
* 03:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34679 and previous config saved to /var/cache/conftool/dbconfig/20220914-034040-ladsgroup.json
* 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34678 and previous config saved to /var/cache/conftool/dbconfig/20220914-033921-ladsgroup.json
* 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P34677 and previous config saved to /var/cache/conftool/dbconfig/20220914-032533-ladsgroup.json
* 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34676 and previous config saved to /var/cache/conftool/dbconfig/20220914-032415-ladsgroup.json
* 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34675 and previous config saved to /var/cache/conftool/dbconfig/20220914-031027-ladsgroup.json
* 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P34674 and previous config saved to /var/cache/conftool/dbconfig/20220914-030908-ladsgroup.json
* 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34673 and previous config saved to /var/cache/conftool/dbconfig/20220914-025402-ladsgroup.json
* 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34672 and previous config saved to /var/cache/conftool/dbconfig/20220914-013204-ladsgroup.json
* 01:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 01:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34671 and previous config saved to /var/cache/conftool/dbconfig/20220914-013143-ladsgroup.json
* 01:24 eileen: civicrm upgraded from {{Gerrit|d91b4a2c}} to {{Gerrit|e82d9cd0}}
* 01:18 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: elastic 6.8 -> 7.10 - bking@cumin1001 - [[phab:T317686|T317686]]
* 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34670 and previous config saved to /var/cache/conftool/dbconfig/20220914-011637-ladsgroup.json
* 01:14 ejegg: disabled delete_deleted_contacts job (will take effect when current job ends)
* 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P34669 and previous config saved to /var/cache/conftool/dbconfig/20220914-010130-ladsgroup.json
* 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34668 and previous config saved to /var/cache/conftool/dbconfig/20220914-004624-ladsgroup.json


== 2020-09-14 ==
== 2022-09-13 ==
* 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 23:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34667 and previous config saved to /var/cache/conftool/dbconfig/20220913-234607-ladsgroup.json
* 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 23:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 23:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34666 and previous config saved to /var/cache/conftool/dbconfig/20220913-234546-ladsgroup.json
*
* 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34665 and previous config saved to /var/cache/conftool/dbconfig/20220913-233039-ladsgroup.json
* 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P34664 and previous config saved to /var/cache/conftool/dbconfig/20220913-231533-ladsgroup.json
* 23:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance


== 2020-09-13 ==
== 2022-09-12 ==
* 23:47 Urbanecm: Change email address of User:Oversight@enwiki to oversight-l@lists.wikimedia.org as part of OTRS downtime preparation ([[phab:T262733|T262733]])
* 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P34556 and previous config saved to /var/cache/conftool/dbconfig/20220912-234833-ladsgroup.json
* 05:51 effie: sudo -i depool mw2297
* 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34555 and previous config saved to /var/cache/conftool/dbconfig/20220912-233327-ladsgroup.json
* 22:53 mutante: phabricator - disabling MediaWiki extension repositories in Diffusion that have 0 commits - [[phab:T296022|T296022]] - [[phab:T315706|T315706]]
* 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34554 and previous config saved to /var/cache/conftool/dbconfig/20220912-224006-ladsgroup.json
* 22:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 22:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34553 and previous config saved to /var/cache/conftool/dbconfig/20220912-223927-ladsgroup.json
* 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34552 and previous config saved to /var/cache/conftool/dbconfig/20220912-222420-ladsgroup.json
* 22:23 mutante: phabricator - disabling repositories: tool-xh-bot, tool-editor-contribution-dashboard, tool-ranker, tool-editor-contribution, tool-mikasa-bot-1, tool-maintun, tool-add-text, tool-wikibookassamese-book.php (none of them had commits) [[phab:T296022|T296022]] - [[phab:T315706|T315706]]
* 22:20 mutante: phabricator - disabling repository "tool-ranker"
* 22:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P34551 and previous config saved to /var/cache/conftool/dbconfig/20220912-220914-ladsgroup.json
* 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34550 and previous config saved to /var/cache/conftool/dbconfig/20220912-215407-ladsgroup.json
* 21:07 bmansurov@deploy1002: Finished deploy [airflow-dags/research@b9be20d]: (no justification provided) (duration: 00m 19s)
* 21:07 bmansurov@deploy1002: Started deploy [airflow-dags/research@b9be20d]: (no justification provided)
* 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 ([[phab:T312863|T312863]])', diff saved to https://phabricator.wikimedia.org/P34549 and previous config saved to /var/cache/conftool/dbconfig/20220912-210123-ladsgroup.json
* 20:57 TheresNoTime: closing UTC late backport window
* 20:56 samtar@deploy1002: Finished scap: Backport for [[gerrit:831549{{!}}Set track_total_hits to true]] (duration: 05m 00s)
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:51 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:831549{{!}}Set track_total_hits to true]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:51 samtar@deploy1002: Started scap: Backport for [[gerrit:831549{{!}}Set track_total_hits to true]]
* 20:49 samtar@deploy1002: Finished scap: Backport for [[gerrit:831117{{!}}Enable Nearby on Hebrew and French Wikipedia (T246493)]] (duration: 07m 27s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34548 and previous config saved to /var/cache/conftool/dbconfig/20220912-204617-ladsgroup.json
* 20:42 samtar@deploy1002: samtar and jdlrobson: Backport for [[gerrit:831117{{!}}Enable Nearby on Hebrew and French Wikipedia (T246493)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:42 samtar@deploy1002: Started scap: Backport for [[gerrit:831117{{!}}Enable Nearby on Hebrew and French Wikipedia (T246493)]]
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:40 samtar@deploy1002: Finished scap: Backport for [[gerrit:830917{{!}}Deploy Research Incentive Survey to idwiki (T316466)]] (duration: 06m 25s)
* 20:39 jhathaway: testing exim config change on mx1001.wikimedia.org
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:38 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dispatch-be1001.eqiad.wmnet
* 20:34 samtar@deploy1002: samtar and dani: Backport for [[gerrit:830917{{!}}Deploy Research Incentive Survey to idwiki (T316466)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 20:34 samtar@deploy1002: Started scap: Backport for [[gerrit:830917{{!}}Deploy Research Incentive Survey to idwiki (T316466)]]
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:32 samtar@deploy1002: Finished scap: Backport for [[gerrit:831548{{!}}Re-enable track_total_hits for elastic7 (T317374)]] (duration: 06m 12s)
* 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P34547 and previous config saved to /var/cache/conftool/dbconfig/20220912-203110-ladsgroup.json
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:26 samtar@deploy1002: samtar and ebernhardson: Backport for [[gerrit:831548{{!}}Re-enable track_total_hits for elastic7 (T317374)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 20:26 samtar@deploy1002: Started scap: Backport for [[gerrit:831548{{!}}Re-enable track_total_hits for elastic7 (T317374)]]
* 20:24 samtar@deploy1002: Finished scap: Backport for [[gerrit:830982{{!}}Create six more namespaces (three content namespaces and their corresponding three discussion namespaces) on the bn.wiktionary (T317424)]] (duration: 08m 14s)
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T314041|T314041]])', diff saved to https://phabricator.wikimedia.org/P34546 and previous config saved to /var/cache/conftool/dbconfig/20220912-202359-ladsgroup.json
* 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/servic