You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(elukey: powercycle elastic1060 - T278630)
imported>Stashbot
(ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298555)', diff saved to https://phabricator.wikimedia.org/P28208 and previous config saved to /var/cache/conftool/dbconfig/20220521-010640-ladsgroup.json)
(377 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2021-03-27 ==
== 2022-05-21 ==
* 19:25 elukey: powercycle elastic1060 - [[phab:T278630|T278630]]
* 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28208 and previous config saved to /var/cache/conftool/dbconfig/20220521-010640-ladsgroup.json
* 06:10 ryankemper: [[phab:T267927|T267927]] `sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 -O /srv/wdqs/latest-all.ttl.bz2 && sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2 -O /srv/wdqs/latest-lexemes.ttl.bz2` on `ryankemper@wdqs2008` tmux session `download_dumps_2020-03-26`
* 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:44 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 05:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 05:42 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 05:42 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28207 and previous config saved to /var/cache/conftool/dbconfig/20220521-010626-ladsgroup.json
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28206 and previous config saved to /var/cache/conftool/dbconfig/20220521-001014-ladsgroup.json
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 00:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 00:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
* 05:38 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 05:38 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload


== 2021-03-26 ==
== 2022-05-20 ==
* 22:27 tzatziki: reset password for Philroc
* 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28205 and previous config saved to /var/cache/conftool/dbconfig/20220520-224558-ladsgroup.json
* 20:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28204 and previous config saved to /var/cache/conftool/dbconfig/20220520-223054-ladsgroup.json
* 20:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 17:44 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/includes/changes/RecentChange.php: RecentChange: directly build the user identity if we have the data - [[phab:T277795|T277795]] (duration: 01m 06s)
* 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 17:42 hashar@deploy1002: Finished scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]] (duration: 31m 43s)
* 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28203 and previous config saved to /var/cache/conftool/dbconfig/20220520-221550-ladsgroup.json
* 17:10 hashar@deploy1002: Started scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]]
* 22:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye
* 15:40 Urbanecm: Delete `commonswiki:ip-autoblock:whitelist` cache key from memcached (wmf.36 moves the autoblock whitelist source, and it was deployed on commonswiki for a while, resulting in the cache key being empty)
* 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28202 and previous config saved to /var/cache/conftool/dbconfig/20220520-220046-ladsgroup.json
* 15:37 hnowlan: importing imposm3_0.11.0+git20201104.4758cf4-1_amd64.changes on apt1001
* 21:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
* 21:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
* 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28201 and previous config saved to /var/cache/conftool/dbconfig/20220520-215514-ladsgroup.json
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
* 21:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
* 13:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
* 21:50 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
* 21:38 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye
* 13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
* 21:37 mutante: correction: mistake was to use FQDN [[phab:T307142|T307142]]
* 13:02 moritzm: reimaging theemin [[phab:T275873|T275873]]
* 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError [[phab:T307142|T307142]]
* 12:56 moritzm: drain ganeti1014
* 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError
* 12:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
* 21:34 mutante: reimaging gitlab1004 (insetup) to test partman recipe from gerrit:793534 - [[phab:T307142|T307142]]
* 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
* 21:34 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
* 12:37 moritzm: drain ganeti1013
* 21:33 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
* 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28198 and previous config saved to /var/cache/conftool/dbconfig/20220520-190633-ladsgroup.json
* 12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
* 19:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:55 Urbanecm: Move `Help talk:Getting Started --> Help talk:Getting started` on enwiki with `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` ([[phab:T278350|T278350]])
* 19:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:49 Urbanecm: Move `User talk:TheAafi/Help talk` to `Help talk:Getting Started` via `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` to fix an UBN task ([[phab:T278350|T278350]])
* 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 10:10 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts chlorine.eqiad.wmnet
* 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 10:02 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts chlorine.eqiad.wmnet
* 18:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts argon.eqiad.wmnet
* 17:55 mutante: [mwmaint1002:~] $ sudo mwscript initSiteStats.php --wiki=kcgwiki --update  (to update statistics for latest wikipedia kcg) [[phab:T305281|T305281]]
* 09:49 filippo@deploy1002: Finished deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}} (duration: 00m 08s)
* 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 09:49 filippo@deploy1002: Started deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}}
* 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 09:46 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts argon.eqiad.wmnet
* 17:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 09:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts acrab.codfw.wmnet
* 17:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5003.eqsin.wmnet with OS bullseye
* 09:43 moritzm: delete fermium in Ganeti (was still around, but powered down) [[phab:T224586|T224586]]
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
* 09:38 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts acrux.codfw.wmnet
* 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 09:36 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrab.codfw.wmnet
* 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 09:32 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrux.codfw.wmnet
* 17:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:31 filippo@deploy1002: Finished deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}} (duration: 00m 12s)
* 17:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
* 09:31 filippo@deploy1002: Started deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}}
* 16:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 09:28 moritzm: drain ganeti1012
* 16:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
* 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
* 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 08:38 moritzm: drain ganeti1010
* 16:37 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti5003.eqsin.wmnet with OS bullseye
* 08:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
* 16:33 robh: troubleshooting ganeti5003 ipmi failure via [[phab:T308211|T308211]]
* 08:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
* 16:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 06:11 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 16:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 16:09 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
* 05:06 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@bb5a072]: 0.3.68 (duration: 07m 31s)
* 16:08 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
* 05:00 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.68` on canary `wdqs1003`; proceeding to rest of fleet
* 16:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2069.codfw.wmnet with OS bullseye
* 04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@bb5a072]: 0.3.68
* 15:58 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
* 04:58 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.68`. Pre-deploy tests passing on canary `wdqs1003`
* 15:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2068.codfw.wmnet with OS bullseye
* 15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
* 15:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
* 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
* 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
* 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2069.codfw.wmnet with OS bullseye
* 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye
* 15:17 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2068.codfw.wmnet with OS bullseye
* 15:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1118 T', diff saved to https://phabricator.wikimedia.org/P28196 and previous config saved to /var/cache/conftool/dbconfig/20220520-151407-ladsgroup.json
* 15:11 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
* 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28195 and previous config saved to /var/cache/conftool/dbconfig/20220520-150838-root.json
* 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28194 and previous config saved to /var/cache/conftool/dbconfig/20220520-145334-root.json
* 14:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2066.codfw.wmnet with OS bullseye
* 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 10 hosts with reason: Maintenance
* 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 10 hosts with reason: Maintenance
* 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28193 and previous config saved to /var/cache/conftool/dbconfig/20220520-144212-ladsgroup.json
* 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P28192 and previous config saved to /var/cache/conftool/dbconfig/20220520-144111-ladsgroup.json
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28191 and previous config saved to /var/cache/conftool/dbconfig/20220520-143830-root.json
* 14:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
* 14:28 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28190 and previous config saved to /var/cache/conftool/dbconfig/20220520-142327-root.json
* 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28189 and previous config saved to /var/cache/conftool/dbconfig/20220520-142032-ladsgroup.json
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28188 and previous config saved to /var/cache/conftool/dbconfig/20220520-141316-ladsgroup.json
* 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
* 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28187 and previous config saved to /var/cache/conftool/dbconfig/20220520-141308-ladsgroup.json
* 14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2066.codfw.wmnet with OS bullseye
* 14:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28186 and previous config saved to /var/cache/conftool/dbconfig/20220520-140823-root.json
* 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28185 and previous config saved to /var/cache/conftool/dbconfig/20220520-135350-ladsgroup.json
* 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
* 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28184 and previous config saved to /var/cache/conftool/dbconfig/20220520-135319-root.json
* 13:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
* 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P28183 and previous config saved to /var/cache/conftool/dbconfig/20220520-134515-ladsgroup.json
* 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 13:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
* 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 1%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28182 and previous config saved to /var/cache/conftool/dbconfig/20220520-133815-root.json
* 13:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: [[phab:T308459|T308459]]
* 13:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: [[phab:T308459|T308459]]
* 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-tls
* 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=varnish-fe
* 13:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-be
* 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
* 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
* 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28181 and previous config saved to /var/cache/conftool/dbconfig/20220520-132307-ladsgroup.json
* 13:15 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye
* 12:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye
* 12:42 mforns@deploy1002: Finished deploy [airflow-dags/analytics@51a203f]: (no justification provided) (duration: 00m 07s)
* 12:42 mforns@deploy1002: Started deploy [airflow-dags/analytics@51a203f]: (no justification provided)
* 12:37 moritzm: copy prometheus-mcrouter-exporter from buster-wikimedia to bullseye-wikimedia (needed for [[phab:T308214|T308214]])
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28180 and previous config saved to /var/cache/conftool/dbconfig/20220520-123045-ladsgroup.json
* 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
* 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28179 and previous config saved to /var/cache/conftool/dbconfig/20220520-123037-ladsgroup.json
* 12:23 Amir1: killed refreshlinks suggestion in 10160
* 12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
* 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28178 and previous config saved to /var/cache/conftool/dbconfig/20220520-121116-ladsgroup.json
* 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 12:10 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
* 11:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye
* 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28177 and previous config saved to /var/cache/conftool/dbconfig/20220520-114234-ladsgroup.json
* 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28176 and previous config saved to /var/cache/conftool/dbconfig/20220520-114202-ladsgroup.json
* 11:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28175 and previous config saved to /var/cache/conftool/dbconfig/20220520-113207-ladsgroup.json
* 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28174 and previous config saved to /var/cache/conftool/dbconfig/20220520-112449-ladsgroup.json
* 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
* 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 ([[phab:T298555|T298555]])', diff saved to https://phabricator.wikimedia.org/P28173 and previous config saved to /var/cache/conftool/dbconfig/20220520-111239-ladsgroup.json
* 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1131.eqiad.wmnet with reason: Maintenance
* 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 8:00:00 on 8 hosts with reason: Maintenance
* 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 8:00:00 on 8 hosts with reason: Maintenance
* 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2104.codfw.wmnet with reason: Maintenance
* 11:09 jynus: drop backupcheck users from m1>dbbackups
* 10:54 moritzm: uploaded cas 6.4.6.3-wmf11u1 to apt.wikimedia.org/bullseye
* 10:52 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
* 10:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
* 10:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:793737{{!}}Revert read new on frwiki for templatelinks migration]] (duration: 00m 51s)
* 10:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2063.codfw.wmnet with OS bullseye
* 09:39 volans@cumin1001: dbctl commit (dc=all): 'emergency depool', diff saved to https://phabricator.wikimedia.org/P28172 and previous config saved to /var/cache/conftool/dbconfig/20220520-093928-volans.json
* 09:34 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage
* 09:33 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2063.codfw.wmnet with reason: host reimage
* 09:17 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2063.codfw.wmnet with OS bullseye
* 08:54 vgutierrez: re-enabling puppet  and repooling cp3060 - [[phab:T308797|T308797]] [[phab:T243167|T243167]]
* 08:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2062.codfw.wmnet with OS bullseye
* 08:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage
* 08:09 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: host reimage
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P28171 and previous config saved to /var/cache/conftool/dbconfig/20220520-080719-root.json
* 07:53 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2062.codfw.wmnet with OS bullseye
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P28170 and previous config saved to /var/cache/conftool/dbconfig/20220520-075215-root.json
* 07:52 jayme: imported kubeconform 0.4.13-1 to buster-,bullseye-wikimedia - [[phab:T306165|T306165]]
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P28169 and previous config saved to /var/cache/conftool/dbconfig/20220520-073712-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P28168 and previous config saved to /var/cache/conftool/dbconfig/20220520-072208-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P28167 and previous config saved to /var/cache/conftool/dbconfig/20220520-070704-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P28166 and previous config saved to /var/cache/conftool/dbconfig/20220520-065200-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P28164 and previous config saved to /var/cache/conftool/dbconfig/20220520-063656-root.json
* 06:03 moritzm: racadm racreset on ganeti5003
* 05:09 marostegui: dbmaint s1@eqiad [[phab:T298554|T298554]]
* 01:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28162 and previous config saved to /var/cache/conftool/dbconfig/20220520-010743-ladsgroup.json
* 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P28161 and previous config saved to /var/cache/conftool/dbconfig/20220520-005237-ladsgroup.json
* 00:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon1003.wikimedia.org with OS bullseye
* 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P28160 and previous config saved to /var/cache/conftool/dbconfig/20220520-003732-ladsgroup.json
* 00:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
* 00:29 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
* 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bullseye
* 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T298560|T298560]])', diff saved to https://phabricator.wikimedia.org/P28159 and previous config saved to /var/cache/conftool/dbconfig/20220520-002227-ladsgroup.json


== 2021-03-25 ==
== 2022-05-19 ==
* 23:47 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/3D/package.json: No-op demo sync (duration: 01m 07s)
* 23:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host netmon1003.wikimedia.org with OS bullseye
* 23:37 stran@deploy1002: Synchronized README: (no justification provided) (duration: 01m 06s)
* 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bullseye
* 23:20 jhuneidi@deploy1002: Synchronized README: [[gerrit:674984{{!}}DEMO: README]] (duration: 01m 07s)
* 22:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:59 brennen: no patches for upcoming deploy window, but we'll be conducting a deployment training using DEMO patches to READMEs.
* 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon1003.mgmt.eqiad.wmnet with reboot policy FORCED
* 22:
* 22:07 robh: cp3060 idrac interface frozen, rebooted via power outlet control on [[phab:T243167|T243167]]
* 20:49 thcipriani: UTC late deploys done
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:40 bking@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:793128{{!}}zhwikiversity: Optimize logo per commons files (T308620)]] (duration: 00m 51s)
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:34 bking@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:792985


== 2021-03-24 ==
== 2022-05-18 ==
* 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 23:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 23:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T303603|T303603]])', diff saved to https://phabricator.wikimedia.org/P28009 and previous config saved to /var/cache/conftool/dbconfig/20220518-235759-ladsgroup.json
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 23:53 mutante: webperf1001 - systemctl reset-failed
 
* 23:53 mutante: webperf1001/webperf2001 - re-enabling notifications in icinga that were disabled without comment (please don't do this, they keep being forgotten on a regular basis)
* 23:49 mutante: seaborgium - broken systemd state in Icinga since 23d - systemctl reset-failed
* 23:48 mutante: ms-be1063 - broken systemd state in Icinga since 19d - systemctl reset-failed
* 23:47 mutante: ms-be1054 - broken systemd state in Icinga since 19d - systemctl reset-failed
* 23:47 mutante: ms-be1036 - broken systemd state in Icinga since 15d - systemctl reset-failed
* 23:45 mutante: dumpsdata1002 - broken systemd state in Icinga since 23d - systemctl reset-failed
* 23:44 mutante: deploy2002 - broken systemd state in Icinga since 42d - systemctl reset-failed
* 23:43 mutante: an-db1002 - broken systemd state in Icinga since 48d - systemctl reset-failed
* 23:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after


== 2021-03-23 ==
== 2022-05-17 ==
* 22:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 23:36 ejegg: updated payments-wiki from {{Gerrit|590fac28}} to {{Gerrit|d9d63a3d}}
* 22:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 22:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:33 dwisehaupt: pushing {{Gerrit|60f9baaf50b}} to fundraising hosts which will enable ssl by default for mysql client connections that use the host my.cnf file - [[phab:T170321|T170321]]
* 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 ([[phab:T300774|T300774]])', diff saved to https://phabricator.wikimedia.org/P27896 and previous config saved to /var/cache/conftool/dbconfig/20220517-222904-ladsgroup.json
* 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace (duration: 02m 07s)
* 22:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace
* 22:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:05 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:27 ppchelko@deploy1002: Finished deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint (duration: 17m 58s)
* 22:17 mwdebug-deploy@deploy1002
* 21:09 ppchelko@deploy1002: Started deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint
* 21:04 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 robh@cumin1001: START - Cookbook sre.dns.netbox


== 2021-03-22 ==
== 2022-05-16 ==
* 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 22:14 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: exim debugging
* 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 22:14 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: exim debugging
* 23:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2250.codfw.wmnet
* 21:47 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:47 robh: ganeti4002 rebooting for firmware update via [[phab:T307997|T307997]]
* 23:18 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: [[phab:T262612|T262612]]: Start glent m1 ab test (duration: 01m 53s)
* 21:44 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 23:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:31 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2250.codfw.wmnet
* 21:26 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2249.codfw.wmnet
* 21:14 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:52 mutante: decom mw2249
* 21:08 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 22:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2249.codfw.wmnet
* 21:07 cstone: civicrm revision changed from {{Gerrit|6d85f1cc}} to {{Gerrit|d45afdfc}}
* 21:08 sbassett: Deployed security patch for [[phab:T272244|T272244]]
* 21:05 mutante: gerrit2002 (in setup) - rebooting
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet,service=canary
* 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet,service=canary
* 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2279.codfw.wmnet,service=canary
* 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2278.codfw.wmnet,service=canary
* 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:50 mutante: gerrit2001 - restarted apache2 as well for consistency
* 20:41 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792141{{!}}Revert "cirrus: Turn on AB test of wbsearchentities profiles" (T306644)]] (duration: 00m 51s)
* 19:47 mutante: gerrit - restarting apache2 after we dropped MaxClients config line. This should make us fall back to Debian default MaxRequestWorkers. (since we use event MPM we should not be using MaxClients in the first place, says #httpd) ([[phab:T277127|T277127]])
* 20:36 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792197{{!}}yiwiktionary: Add localized mobile wordmark (T308411)]] and [[gerrit:792196{{!}}hewiktionary: Add localized mobile wordmark (T308411)]] (duration: 00m 50s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|25247c9cbba3d3741908164f2d15fb8497ce8b5e}}: hrwiki: Configure mentorship for Growth team features ([[phab:T275684|T275684]]) (duration: 01m 00s)
* 20:34 catrope@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-yi.svg: Config: [[gerrit:792197{{!}}yiwiktionary: Add localized mobile wordmark (T308411)]] (duration: 00m 49s)
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|951601f7a4c887f21e209b32dbd1cfd3da084816}}: Grant enwiki pagemovers the delete-redirect right ([[phab:T278131|T278131]]) (duration: 00m 59s)
* 20:33 catrope@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-he.svg: Config: [[gerrit:792196{{!}}hewiktionary: Add localized mobile wordmark (T308411)]] (duration: 00m 50s)
* 17:30 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T274200|T274200]])
* 20:31 catrope@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:792192{{!}}yiwiktionary: Update desktop logo (T308411)]] (duration: 00m 51s)
* 16:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:29 catrope@deploy1002: Synchronized static/images/project-logos/: Config: [[gerrit:792192{{!}}yiwiktionary: Update desktop logo (T308411)]] (duration: 00m 51s)
* 16:47 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:46 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 20:20 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791725{{!}}thwikibooks: Enable import (T308374)]] (duration: 00m 51s)
* 16:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:14 catrope@deploy1002: Synchronized wmf-config: Config: [[gerrit:792149{{!}}GrowthExperiments: Update campaigns benefit list config (T305659)]] (duration: 00m 51s)
* 16:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14990 and previous config saved to /var/cache/conftool/dbconfig/20210322-155808-root.json
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14989 and previous config saved to /var/cache/conftool/dbconfig/20210322-154304-root.json
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:38 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14988 and previous config saved to /var/cache/conftool/dbconfig/20210322-152800-root.json
* 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14987 and previous config saved to /var/cache/conftool/dbconfig/20210322-151257-root.json
* 18:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:26 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 18:42 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/api/ApiQueryBacklinksprop.php: Backport: [[gerrit:792140{{!}}ApiQueryBacklinksprop: Make sure the index setting exists (T306673)]] (duration: 00m 50s)
* 14:23 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 18:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:22 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P14986 and previous config saved to /var/cache/conftool/dbconfig/20210322-141146-marostegui.json
* 17:25 mutante: ACKIng again all unhandled CRIT alerts on hosts with "dev" in their name - (imho dev hosts should not have prod CRIT alerts?)
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14985 and previous config saved to /var/cache/conftool/dbconfig/20210322-140800-root.json
* 15:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox-dev2001.wikimedia.org
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad - [[phab:T277771|T277771]]
* 15:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad
* 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14984 and previous config saved to /var/cache/conftool/dbconfig/20210322-135256-root.json
* 15:50 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14983 and previous config saved to /var/cache/conftool/dbconfig/20210322-133753-root.json
* 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14982 and previous config saved to /var/cache/conftool/dbconfig/20210322-132249-root.json
* 15:47 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox-dev2001.wikimedia.org
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 15:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:792229{{!}} Bumping portals to master (T128546)]] (duration: 00m 51s)
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 15:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:792229{{!}} Bumping portals to master (T128546)]] (duration: 00m 50s)
* 13:16 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 15:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts netbox2001-dev.wikimedia.org
* 12:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 15:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 12:20 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:39 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts netbox2001-dev.wikimedia.org
* 12:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 15:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P14981 and previous config saved to /var/cache/conftool/dbconfig/20210322-121924-marostegui.json
* 15:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update homer wmf-netbox plugin - ayounsi@cumin1001
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14980 and previous config saved to /var/cache/conftool/dbconfig/20210322-112954-root.json
* 15:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14979 and previous config saved to /var/cache/conftool/dbconfig/20210322-112707-root.json
* 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:22 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update homer wmf-netbox plugin - ayounsi@cumin1001
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:15 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:18 papaul: rebooting pfw3[a-b]-eqiad for Junos upgrade
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14978 and previous config saved to /var/cache/conftool/dbconfig/20210322-111451-root.json
* 14:50 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/api/ApiQueryBacklinksprop.php: Backport: Revert: [[gerrit:792136{{!}}ApiQueryBacklinksprop: Force the correct templatelinks index on read new (T306673)]] (duration: 00m 50s)
* 11:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:47 ladsgroup@deploy1002: scap failed: average error rate on 3/8 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14977 and previous config saved to /var/cache/conftool/dbconfig/20210322-111203-root.json
* 14:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14976 and previous config saved to /var/cache/conftool/dbconfig/20210322-105947-root.json
* 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14975 and previous config saved to /var/cache/conftool/dbconfig/20210322-105700-root.json
* 14:42 XioNoX: fix MTUs on asw-c-codfw
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 14:14 godog: bump disk space in prometheus codfw k8s-ml-serve  (+30G)
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:14 Lucas_WMDE: UTC afternoon backport+config window done (just for the record; actual last backport was half an hour ago)
* 10:51 moritzm: installing libdbi-perl security updates
* 13:54 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:52 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:50 XioNoX: fix MTUs on asw-b-codfw
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:47 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:46 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14974 and previous config saved to /var/cache/conftool/dbconfig/20210322-104443-root.json
* 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14973 and previous config saved to /var/cache/conftool/dbconfig/20210322-104156-root.json
* 13:41 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:42 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 13:38 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791724{{!}}thwikibooks: set wgRestrictDisplayTitle to false (T308375)]] (duration: 00m 50s)
* 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:29 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript updateArticleCount.php thwikibooks --update # [[phab:T308376|T308376]] [basically instantaneous, 1558 articles]
* 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791722{{!}}thwikibooks: Add NS 104 and 106 to wgContentNamespaces (T308376)]] (duration: 00m 53s)
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:24 godog: free up space on thanos-be2001 on /var/log/spool/rsyslog
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791717{{!}}thwikibooks: Enable babel categorize (T308378)]] (duration: 00m 52s)
* 10:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 12:43 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 12:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:12 elukey: run homer for cr1/cr2 eqiad and codfw to add new iBGP session for the k8s ML clusters - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/661055
* 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:50 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config cleanup (duration: 00m 57s)
* 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:49 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config cleanup (duration: 00m 59s)
* 12:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:48 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config cleanup (duration: 01m 20s)
* 12:21 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 49s)
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for schema change', diff saved to https://phabricator.wikimedia.org/P14971 and previous config saved to /var/cache/conftool/dbconfig/20210322-093558-marostegui.json
* 12:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 48s)
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14970 and previous config saved to /var/cache/conftool/dbconfig/20210322-091534-root.json
* 12:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 49s)
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14969 and previous config saved to /var/cache/conftool/dbconfig/20210322-090030-root.json
* 12:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 49s)
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14968 and previous config saved to /var/cache/conftool/dbconfig/20210322-084527-root.json
* 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14967 and previous config saved to /var/cache/conftool/dbconfig/20210322-083023-root.json
* 12:13 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating kcgwiki ([[phab:T305279|T305279]])
* 08:13 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]] [[phab:T268435|T268435]]
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 12:11 urbanecm@deploy1002: Synchronized dblists: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 50s)
* 08:02 jayme: build and release docker-registry.discovery.wmnet/eventrouter:0.3.0-6, docker-registry.discovery.wmnet/fluent-bit:1.5.3-3, docker-registry.discovery.wmnet/ratelimit:1.5.1-s3
* 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:00 marostegui: Stop MySQL on db1085 to clone db1165 (lag will appear on s6 on wiki replicas) [[phab:T258361|T258361]]
* 12:10 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating kcgwiki ([[phab:T305279|T305279]]) (duration: 00m 49s)
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 to clone db1165', diff saved to https://phabricator.wikimedia.org/P14965 and previous config saved to /var/cache/conftool/dbconfig/20210322-080020-marostegui.json
* 11:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1081.eqiad.wmnet with reason: [[phab:T308267|T308267]]
* 07:51 elukey: stop/start mariadb instances on dbstore1004 to reduce buffer pool memory settings - [[phab:T273865|T273865]]
* 11:59 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1081.eqiad.wmnet with reason: [[phab:T308267|T308267]]
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14964 and previous config saved to /var/cache/conftool/dbconfig/20210322-073747-root.json
* 11:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14963 and previous config saved to /var/cache/conftool/dbconfig/20210322-072243-root.json
* 11:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change', diff saved to https://phabricator.wikimedia.org/P14962 and previous config saved to /var/cache/conftool/dbconfig/20210322-071430-marostegui.json
* 11:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14961 and previous config saved to /var/cache/conftool/dbconfig/20210322-070740-root.json
* 11:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14960 and previous config saved to /var/cache/conftool/dbconfig/20210322-065236-root.json
* 11:26 XioNoX: asw2-ulsfo fix MTU on 2 interfaces
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1084 from dbctl [[phab:T276302|T276302]]', diff saved to https://phabricator.wikimedia.org/P14959 and previous config saved to /var/cache/conftool/dbconfig/20210322-063732-marostegui.json
* 11:09 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes: Backport: [[gerrit:792126{{!}}RestrictionStore: Add support for templatelinks migration (T308207)]] (duration: 00m 54s)
* 06:11 marostegui: Sanitize db1124 db2094 db1154: taywiki trvwiki mnwwiktionary
* 11:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:28 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 11:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:57 vgutierrez: test HAProxy 2.4.17 on cp4026 and cp4032
* 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:58 urbanecm: UTC morning B&C window done
* 07:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e9a00e8}}: GrowthExperiments: Update campaigns configuration ([[phab:T305443|T305443]], [[phab:T305659|T305659]], [[phab:T307521|T307521]]) (duration: 00m 50s)
* 07:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dc82dfa8}}: ptwikinews: Enable extension MediaSearch ([[phab:T299872|T299872]]) (duration: 00m 48s)
* 07:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:44 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|57d4a9c}}: thwikibooks: Enable quiz extension ([[phab:T308377|T308377]]) (duration: 00m 48s)
* 07:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3e04f86}}: thwikibooks: Add more namespaces to wgNamespacesToBeSearchedDefault ([[phab:T308373|T308373]]) (duration: 00m 48s)
* 07:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:36 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|67ce6ce}}: zhwikisource: Add NS100 to wgNamespacesToBeSearchedDefault ([[phab:T308393|T308393]]) (duration: 00m 50s)
* 07:18 dcausse: restarting blazegraph on wdqs1007 (BlazegraphFreeAllocatorsDecreasingRapidly)


== 2021-03-21 ==
== 2022-05-15 ==
* 10:25 _joe_: restarting gerrit on gerrit1001, using 45G of reserved memory
* 21:47 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 07s)
* 09:22 elukey: install apache2-bin-dbgsym on gerrit1001 - [[phab:T277127|T277127]]
* 21:46 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 08:50 qchris: Restarting apache on gerrit1001 again (all apache workers busy again) see [[phab:T277127|T277127]]
* 21:42 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 07s)
* 08:18 qchris: Restarting apache on gerrit1001 (all apache workers busy)
* 21:42 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 21:39 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 08s)
* 21:39 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 21:30 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 08s)
* 21:30 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)
* 21:14 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided) (duration: 00m 08s)
* 21:14 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@378e7ca]: (no justification provided)


== 2021-03-20 ==
== 2022-05-14 ==
* 00:22 tzatziki: altering emails for STei (WMF) and SGrabarczuk (WMF)
* 08:34 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1172', diff saved to https://phabricator.wikimedia.org/P27830 and previous config saved to /var/cache/conftool/dbconfig/20220514-083421-jynus.json
* 00:53 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Server need to be downgraded to stretch, on monday
* 00:53 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Server need to be downgraded to stretch, on monday


== 2021-03-19 ==
== 2022-05-13 ==
* 21:11 mutante: scandium - stop apache and rerun puppet which fails after reimaging because it tries to run an nginx on port 80 which is already used by apache [[phab:T268248|T268248]]
* 23:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1007.eqiad.wmnet with reason: Upgrade turnilo
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
* 23:42 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1007.eqiad.wmnet with reason: Upgrade turnilo
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
* 23:14 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@bf60521]: Staging deployment of turnilo 1.35 (duration: 00m 08s)
* 20:15 mutante: scandium - reimaging with buster
* 23:13 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@bf60521]: Staging deployment of turnilo 1.35
* 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
* 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1003.wikimedia.org
* 20:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
* 17:31 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1003.wikimedia.org
* 20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2245.codfw.wmnet
* 17:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1004.wikimedia.org
* 19:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2245.codfw.wmnet
* 17:24 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2244.codfw.wmnet
* 17:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudservices1004.wikimedia.org
* 19:53 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1002.wikimedia.org
* 17:24 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
* 19:50 mutante: testreduce1001 - confirmed MariaDB @@datadir is /srv/data/mysql and deleting /var/lib/mysql ([[phab:T277580|T277580]])
* 15:57 _joe_: uploading conftool 2.2.0 to buster, bullseye [[phab:T305824|T305824]] [[phab:T305582|T305582]] [[phab:T305607|T305607]] [[phab:T305638|T305638]] [[phab:T307905|T307905]] [[phab:T308100|T308100]]
* 19:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2244.codfw.wmnet
* 12:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2245.codfw.wmnet
* 12:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 19:39 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1002.wikimedia.org
* 12:37 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2244.codfw.wmnet
* 12:37 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet,service=canary
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2140 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P27824 and previous config saved to /var/cache/conftool/dbconfig/20220513-121832-marostegui.json
* 19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet,service=canary
* 12:09 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2252.codfw.wmnet,service=canary
* 11:59 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2251.codfw.wmnet,service=canary
* 11:57 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 19:24 mutante: deploy2002 - re-enabled puppet, reverted patch of scap-sync-master
* 11:47 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 18:46 mutante: deploy2002 - disable puppet, copy modified version of scap-master-sync over it that does not --exclude="**/cache/l10n/*.cdb"  (for [[phab:T275826|T275826]])
* 11:40 moritzm: installing idp-test1002 [[phab:T308214|T308214]]
* 16:01 effie: upgrade memcached on mc-gp200*
* 10:55 moritzm: installing idp-test2002 [[phab:T308214|T308214]]
* 12:36 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 10:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 12:34 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual reimage
* 12:10 effie: upgrade memcached on mc1026,mc2026
* 10:18 vgutierrez: disable puppet on gerrit1001 to fix /etc/ssh/ssh_config
* 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:39 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:03 jynus: moving s2 database from db2101 to db2097 [[phab:T299920|T299920]]
* 11:36 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:59 moritzm: draining ganeti4002 [[phab:T307997|T307997]]
* 11:36 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:52 XioNoX: add init7 transit in drmrs
* 11:30 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:39 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4001.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:39 root@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4001.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4001.ulsfo.wmnet
* 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
* 11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:18 Amir1: start of mwscript extensions/Echo/maintenance/removeOrphanedEvents.php --wiki=wikidatawiki --force ([[phab:T308084|T308084]])
* 11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 02:14 ejegg: updated payments-wiki from {{Gerrit|8f46af9d}} to {{Gerrit|590fac28}}
* 11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 11:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:42 moritzm: installing dbmonitor1002 [[phab:T224589|T224589]]
* 10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:11 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 10:05 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 09:40 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:36 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:22 elukey: upload alluxio 2.4.1 to thirdparty/bigtop15 on stretch/buster-wikimedia
* 07:16 ryankemper: [[phab:T275885|T275885]] `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (change hadn't been merged when I ran the agent earlier)
* 04:04 eileen: civicrm revision changed from {{Gerrit|99bf1c9210}} to {{Gerrit|39d24e8b0a}}, config revision is {{Gerrit|26b02db7ba}}
* 03:27 ryankemper: [wdqs] `ryankemper@wdqs1013:~$ sudo systemctl restart wdqs-blazegraph`
* 03:26 ryankemper: [[phab:T275885|T275885]] `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo run-puppet-agent'`
* 02:43 ryankemper: [[phab:T275885|T275885]] Revoking current `relforge` TLS cert in advance of generation of new cert: `ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean relforge.svc.eqiad.wmnet`
* 00:51 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: [[phab:T277772|T277772]] (duration: 00m 58s)
* 00:45 mutante: testreduce1001 - stop mysql; rsyncing /var/lib/mysql to /srv/data/mysql ([[phab:T277580|T277580]])


== 2021-03-18 ==
== 2022-05-12 ==
* 23:56 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Don't define a default icon ([[phab:T274199|T274199]]) (duration: 00m 57s)
* 21:56 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@a2bdc3e]: (no justification provided) (duration: 02m 08s)
* 23:38 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: [[gerrit:673115{{!}}ActorStore::getActorById - fall back to master. (T277795)]] (duration: 00m 57s)
* 21:53 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@a2bdc3e]: (no justification provided)
* 23:35 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: [[gerrit:673115{{!}}ActorStore::getActorById - fall back to master. (T277795)]] (duration: 00m 58s)
* 21:43 robh: cp306[23] returned to service, cp306[45] coming down for firmware update via [[phab:T243167|T243167]]
* 23:25 dduvall@deploy1002: Synchronized .pipeline: config: [[gerrit:673375{{!}}Use build environment HTTP proxy for APT sources (T277109)]] (duration: 01m 02s)
* 21:15 robh: cp306[01] returned to service, cp306[23] coming down for firmware update via [[phab:T243167|T243167]]
* 23:06 brennen: train status: 1.36.0-wmf.35 ([[phab:T274939|T274939]]) stable on all wikis after deploy of hotfix for [[phab:T277795|T277795]]
* 20:59 brennen: utc late backport & config window closed
* 22:53 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/specials/SpecialContributions.php: Backport: [[gerrit:673115{{!}}ActorStore::getActorById - fall back to master. (T277795)]] (duration: 01m 07s)
* 20:50 robh: resuming last 6 esams cp host firmware updates via [[phab:T243167|T243167]].  cp306[01] going offline
* 22:30 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 20:50 Krinkle: krinkle@mwmaint1002$ mwscript refreshLinks.php --wiki commonswiki --category 'Media_needing_categories_requiring_human_attention' (approximately 2000 tiny pages)
* 22:29 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:25 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:37 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: (no justification provided) (duration: 01m 05s)
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.35
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:28 legoktm: re-enabled puppet on registry*
* 20:39 brennen@deploy1002: Finished scap: Backport for [[gerrit:791430]] viwiki: Enable "upload_by_url" for sysop (duration: 01m 36s)
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|44eddcc}}: hrwiki: Deploy Growth features to newcomers ([[phab:T275684|T275684]]) (duration: 01m 08s)
* 20:37 brennen@deploy1002: Started scap: Backport for [[gerrit:791430]] viwiki: Enable "upload_by_url" for sysop
* 18:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|179d9e5}}: mswiki: Enable Growth features in stealth mode ([[phab:T277562|T277562]]; 2/2) (duration: 01m 08s)
* 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|179d9e5}}: mswiki: Enable Growth features in stealth mode ([[phab:T277562|T277562]]; 1/2) (duration: 01m 11s)
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:58 legoktm: disabled puppet on registry* for rolling out https://gerrit.wikimedia.org/r/672537
* 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:50 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|55aa6cb}}: tewiki: Enable Growth features in stealth mode ([[phab:T277491|T277491]]; 2/2) (duration: 01m 08s)
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2242.codfw.wmnet
* 20:32 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791424{{!}}ruwiktionary: Add localized mobile wordmark (T308233)]] (duration: 00m 50s)
* 17:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|55aa6cb}}: tewiki: Enable Growth features in stealth mode ([[phab:T277491|T277491]]; 1/2) (duration: 01m 10s)
* 20:31 brennen@deploy1002: Synchronized static/images/mobile/copyright/wiktionary-wordmark-ru.svg: Config: [[gerrit:791424{{!}}ruwiktionary: Add localized mobile wordmark (T308233)]] (duration: 00m 49s)
* 17:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|04342e9bb0765a6a58ad78bd7eaa380d4167f0c1}}: simplewiki: Enable Growth team features in stealth mode ([[phab:T277550|T277550]]) (duration: 01m 09s)
* 20:25 brennen@deploy1002: Finished scap: Backport for [[gerrit:785229]] Enable "upload_by_url" feature on zhwiki (duration: 01m 46s)
* 17:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|04342e9bb0765a6a58ad78bd7eaa380d4167f0c1}}: simplewiki: Enable Growth team features in stealth mode ([[phab:T277550|T277550]]) (duration: 01m 10s)
* 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:40 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 20:23 brennen@deploy1002: Started scap: Backport for [[gerrit:785229]] Enable "upload_by_url" feature on zhwiki
* 17:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2242.codfw.wmnet
* 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2241.codfw.wmnet
* 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2241.codfw.wmnet
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2240.codfw.wmnet
* 20:17 brennen@deploy1002: backport aborted: (duration: 02m 05s)
* 16:54 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2240.codfw.wmnet
* 20:17 brennen@deploy1002: prep aborted:  (duration: 00m 01s)
* 16:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2239.codfw.wmnet
* 19:57 hashar: Restarting Gerrit
* 16:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2239.codfw.wmnet
* 19:53 mutante: gitlab2001 - systemctl start backup-restore -  systemd[1]: Started GitLab Backup Restore. after gerrit:791410  for [[phab:T308089|T308089]]
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2242.codfw.wmnet
* 18:57 jelto: restart gitlab2001
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2241.codfw.wmnet
* 18:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2240.codfw.wmnet
* 18:26 krinkle@deploy1002: Synchronized w/static.php: {{Gerrit|Ic0a5eae4f721a16403071d1b2136cf23d78e4fa9}} (duration: 00m 49s)
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2239.codfw.wmnet
* 18:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 15:33 shdubsh: clean up dead letter queue and restart all logstashes
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:37 dcausse: repooling wdqs1005
* 18:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4001.ulsfo.wmnet with reason: host reimage
* 14:29 hashar: Restarting CI Jenkins for plugin upgrade
* 18:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4001.ulsfo.wmnet with reason: host reimage
* 13:49 elukey: reboot analytics1066
* 17:52 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:23 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/Wikibase/repo: [[gerrit:673108{{!}}languageLabelDescriptionAliases: use getLanguageNameByCode]] ([[phab:T275611|T275611]] [[phab:T277722|T277722]]) (duration: 01m 14s)
* 17:51 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 12:58 jbond42: upload cas_6.3.2 to apt buster-wikimedia
* 17:50 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
* 11:37 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 17:50 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@5047d7d]: (no justification provided) (duration: 00m 08s)
* 11:34 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 17:50 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@5047d7d]: (no justification provided)
* 11:25 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 17:50 razzi@deploy1002: Finished deploy [analytics/turnilo/deploy@9cfdfaf]: (no justification provided) (duration: 29m 32s)
* 11:24 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|896c9f019b17d1ad3a1589d377158ca2fb91ebaa}}: flaggedrevs: Disable multiple dimensions in hewikisource (duration: 01m 09s)
* 17:50 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
* 11:20 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/GrowthExperiments/includes/HomepageHooks.php: {{Gerrit|3b2aa1aa28e9d204f32ae937a84ec211137cbb2e}}: Remove variant C from list of valid variants ([[phab:T277727|T277727]]) (duration: 01m 09s)
* 17:47 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
* 11:16 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:46 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
* 11:14 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
* 11:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 17:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0005676e704cad907655a4a0bca7bd2164714b1c}}: GrowthExperiments: set $wgGEHomepageNewAccountVariants to D only ([[phab:T277727|T277727]]) (duration: 01m 10s)
* 17:43 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 11:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: NOOP: {{Gerrit|e7f5eac}}: Enable CentralAuth IRC feed in beta cluster ([[phab:T277432|T277432]]) (duration: 01m 12s)
* 17:31 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1006.eqiad.wmnet with OS buster
* 09:13 _joe_: hard reboot of snapshot1005
* 17:26 jmm@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 09:04 _joe_: attempted reboot of snapshot1005, read-only filesystem and probably disks are broken beyond repair
* 17:21 razzi@deploy1002: Started deploy [analytics/turnilo/deploy@9cfdfaf]: (no justification provided)
* 08:27 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - [[phab:T272836|T272836]]
* 17:08 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 17:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1006.eqiad.wmnet with reason: host reimage
* 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 16:57 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1006.eqiad.wmnet with reason: host reimage
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14946 and previous config saved to /var/cache/conftool/dbconfig/20210318-080258-root.json
* 16:53 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Attempting OS upgrade
* 08:02 akosiaris: reimage ml-serve1004 to debug a docker volume_group issue
* 16:53 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Attempting OS upgrade
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14945 and previous config saved to /var/cache/conftool/dbconfig/20210318-074754-root.json
* 16:35 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1006.eqiad.wmnet with OS buster
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14944 and previous config saved to /var/cache/conftool/dbconfig/20210318-073250-root.json
* 16:21 mutante: gitlab2001 - trying to stop 'puma' for debugging [[phab:T308089|T308089]]
* 07:20 dcausse: depooling & restarting blazegraph on wdqs1005
* 16:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:19 marostegui: Deploy schema change on s4 codfw master, lag will appear - [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 16:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14943 and previous config saved to /var/cache/conftool/dbconfig/20210318-071747-root.json
* 16:06 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 16:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labstore1006.wikimedia.org
* 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 15:57 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host labstore1006.wikimedia.org
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1161 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14942 and previous config saved to /var/cache/conftool/dbconfig/20210318-063241-marostegui.json
* 15:57 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2120', diff saved to https://phabricator.wikimedia.org/P14941 and previous config saved to /var/cache/conftool/dbconfig/20210318-062201-marostegui.json
* 15:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labstore1007.wikimedia.org
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P14940 and previous config saved to /var/cache/conftool/dbconfig/20210318-060445-marostegui.json
* 15:53 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host labstore1005.eqiad.wmnet
* 03:46 andrewbogott: restarting slapd on seaborgium, serpens, and r-o ldap replicas (we're getting irregular connection failures)
* 15:06 razzi@cumin1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 00:05 eileen: tools revision changed from {{Gerrit|b7b4060c30}} to {{Gerrit|ef54260b0d}}
* 15:05 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1008.eqiad.wmnet with reason: host reimage
* 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P27819 and previous config saved to /var/cache/conftool/dbconfig/20220512-145554-root.json
* 14:48 razzi@cumin1001: conftool action : set/pooled=inactive; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:48 razzi@cumin1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:47 razzi@cumin1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:45 moritzm: installing gnupg2 updates from Bullseye point release
* 14:44 razzi@cumin1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
* 14:43 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1008.eqiad.wmnet with OS buster
* 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P27818 and previous config saved to /var/cache/conftool/dbconfig/20220512-144050-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27817 and previous config saved to /var/cache/conftool/dbconfig/20220512-143954-root.json
* 14:33 razzi@cumin1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
* 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P27816 and previous config saved to /var/cache/conftool/dbconfig/20220512-142546-root.json
* 14:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1009.eqiad.wmnet with OS buster
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27815 and previous config saved to /var/cache/conftool/dbconfig/20220512-142450-root.json
* 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P27814 and previous config saved to /var/cache/conftool/dbconfig/20220512-141042-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27813 and previous config saved to /var/cache/conftool/dbconfig/20220512-140946-root.json
* 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1164.eqiad.wmnet with OS bullseye
* 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1141 depooling: Maint', diff saved to https://phabricator.wikimedia.org/P27812 and previous config saved to /var/cache/conftool/dbconfig/20220512-135848-root.json
* 13:55 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1009.eqiad.wmnet with reason: host reimage
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27811 and previous config saved to /var/cache/conftool/dbconfig/20220512-135442-root.json
* 13:52 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1009.eqiad.wmnet with reason: host reimage
* 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1164.eqiad.wmnet with reason: host reimage
* 13:48 moritzm: installing ffmpeg security updates
* 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1164.eqiad.wmnet with reason: host reimage
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27809 and previous config saved to /var/cache/conftool/dbconfig/20220512-133938-root.json
* 13:38 tgr: EU mid-day deploys done
* 13:37 tgr@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/AddLink/ServiceLinkRecommendationProvider.php: Backport: [[gerrit:791251{{!}}Send sections_to_exclude in the POST body (T308186)]] (duration: 00m 49s)
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:34 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1164.eqiad.wmnet with OS bullseye
* 13:30 tgr@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 13:30 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1009.eqiad.wmnet with OS buster
* 13:28 tgr@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 13:26 tgr@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: After optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27808 and previous config saved to /var/cache/conftool/dbconfig/20220512-132434-root.json
* 13:23 tgr@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 13:21 tgr@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 13:19 tgr@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 13:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1007.eqiad.wmnet with OS buster
* 13:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1004.eqiad.wmnet with OS buster
* 12:45 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1007.eqiad.wmnet with reason: host reimage
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for optimizing recentchanges', diff saved to https://phabricator.wikimedia.org/P27807 and previous config saved to /var/cache/conftool/dbconfig/20220512-124406-marostegui.json
* 12:43 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 12:42 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1007.eqiad.wmnet with reason: host reimage
* 12:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1004.eqiad.wmnet with reason: host reimage
* 12:38 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 12:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1004.eqiad.wmnet with reason: host reimage
* 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:30 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.10/includes/api/ApiQueryInfo.php: Backport: [[gerrit:791252{{!}}ApiQueryInfo: Force PRIMARY index on templatelinks (T308207)]] (duration: 00m 50s)
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:28 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27806 and previous config saved to /var/cache/conftool/dbconfig/20220512-122707-marostegui.json
* 12:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:24 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 12:20 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1007.eqiad.wmnet with OS buster
* 12:17 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 12:14 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1005.eqiad.wmnet with OS buster
* 12:12 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores1004.eqiad.wmnet with OS buster
* 12:12 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 12:04 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
* 12:00 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
* 11:57 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27805 and previous config saved to /var/cache/conftool/dbconfig/20220512-115445-marostegui.json
* 11:51 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
* 11:50 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
* 11:46 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
* 11:43 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1005.eqiad.wmnet with reason: host reimage
* 11:40 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1005.eqiad.wmnet with reason: host reimage
* 11:21 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1020.eqiad.wmnet with OS bullseye
* 11:17 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1005.eqiad.wmnet with OS buster
* 11:14 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test1002.wikimedia.org
* 10:55 jmm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27804 and previous config saved to /var/cache/conftool/dbconfig/20220512-105432-marostegui.json
* 10:50 jmm@cumin1001: START - Cookbook sre.dns.netbox
* 10:50 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host idp-test1002.wikimedia.org
* 10:46 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2002.wikimedia.org
* 10:45 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS bullseye
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27803 and previous config saved to /var/cache/conftool/dbconfig/20220512-103333-marostegui.json
* 10:19 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1020.eqiad.wmnet with OS bullseye
* 10:19 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS bullseye
* 10:11 moritzm: installing Apache 2.4.53 updates on bullseye
* 09:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1002.eqiad.wmnet with OS buster
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27802 and previous config saved to /var/cache/conftool/dbconfig/20220512-094642-marostegui.json
* 09:36 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1003.eqiad.wmnet with OS buster
* 09:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1002.eqiad.wmnet with reason: host reimage
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27800 and previous config saved to /var/cache/conftool/dbconfig/20220512-091706-marostegui.json
* 09:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1002.eqiad.wmnet with reason: host reimage
* 09:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1003.eqiad.wmnet with reason: host reimage
* 09:03 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1003.eqiad.wmnet with reason: host reimage
* 08:52 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores1002.eqiad.wmnet with OS buster
* 08:45 jmm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:40 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ores1003.eqiad.wmnet with OS buster
* 08:32 jmm@cumin1001: START - Cookbook sre.dns.netbox
* 08:31 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host idp-test2002.wikimedia.org
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27799 and previous config saved to /var/cache/conftool/dbconfig/20220512-081814-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27798 and previous config saved to /var/cache/conftool/dbconfig/20220512-075703-marostegui.json
* 07:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1001.eqiad.wmnet with OS buster
* 07:34 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 07:33 marostegui: dbmaint s7@codfw [[phab:T308206|T308206]]
* 07:32 marostegui: dbmaint s6@eqiad [[phab:T308206|T308206]]
* 07:32 marostegui: dbmaint s6@codfw [[phab:T308206|T308206]]
* 07:29 marostegui: dbmaint s3@codfw [[phab:T308206|T308206]]
* 07:29 marostegui: dbmaint s3@eqiad [[phab:T308206|T308206]]
* 07:18 marostegui: dbmaint s7@codfw [[phab:T308206|T308206]]
* 07:16 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1001.eqiad.wmnet with reason: host reimage
* 07:08 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:791107{{!}}Enable Section Translation in cs, el, he, ko, sw and tr WPs (T304855 T304854 T298239 T304863 T304853 T304828)]] (duration: 00m 51s)
* 07:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores1001.eqiad.wmnet with reason: host reimage
* 07:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:44 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores1001.eqiad.wmnet with OS buster
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27797 and previous config saved to /var/cache/conftool/dbconfig/20220512-063217-marostegui.json
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1127 to test 10.6 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27796 and previous config saved to /var/cache/conftool/dbconfig/20220512-062241-marostegui.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1127 with low weight [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27795 and previous config saved to /var/cache/conftool/dbconfig/20220512-061305-marostegui.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 [[phab:T308126|T308126]]', diff saved to https://phabricator.wikimedia.org/P27794 and previous config saved to /var/cache/conftool/dbconfig/20220512-055918-marostegui.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2122 [[phab:T307501|T307501]]', diff saved to https://phabricator.wikimedia.org/P27793 and previous config saved to /var/cache/conftool/dbconfig/20220512-054138-marostegui.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122 [[phab:T307501|T307501]]', diff saved to https://phabricator.wikimedia.org/P27792 and previous config saved to /var/cache/conftool/dbconfig/20220512-053444-marostegui.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 [[phab:T308202|T308202]]', diff saved to https://phabricator.wikimedia.org/P27791 and previous config saved to /var/cache/conftool/dbconfig/20220512-051106-marostegui.json
* 04:07 kart_: Updated cxserver to 2022-05-11-135122-production ([[phab:T307967|T307967]], [[phab:T306999|T306999]], [[phab:T298239|T298239]], [[phab:T304853|T304853]], [[phab:T307507|T307507]], [[phab:T308039|T308039]])
* 04:05 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 04:04 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 04:01 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 04:01 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 03:57 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 03:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply


== 2021-03-17 ==
== 2022-05-11 ==
* 23:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c730dd5feb865a8325279cd4e76c133512f14251}}: idwiki: Deploy Growth features to newcomers ([[phab:T259024|T259024]]) (duration: 01m 08s)
* 22:28 robh: cp305[67] returned to service and all green in icinga, cp305[89] depooling for firmware update [[phab:T243167|T243167]]
* 23:40 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5c14e7d2045f0905f7e85b249e821bbe8d69c600}}: Define confirmed group in MediaWikiServices hook ([[phab:T275334|T275334]], [[phab:T277704|T277704]], [[phab:T275310|T275310]], [[phab:T275333|T275333]]) (duration: 01m 08s)
* 22:00 robh: cp305[45] returned to service and all green in icinga, cp305[67] depooling for firmware update [[phab:T243167|T243167]]
* 23:30 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/CirrusSearch/profiles/FallbackProfiles.config.php: Add fallback profile including glent m1 (duration: 01m 42s)
* 21:34 robh: cp30[23] returned to service and all green in icinga, cp30[45] depooling for firmware update [[phab:T243167|T243167]]
* 22:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 21:34 robh: cp50[23] returned to service and all green in icinga, cp50[45] depooling for firmware update [[phab:T243167|T243167]]
* 22:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 21:33 robh: cp50[23] returned to service and all green in icinga, cp50[45] depooling for firmware update
* 22:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 21:01 robh: cp305[23] going offline via [[phab:T243167|T243167]] for firmware updates (puppet agent disabled and depooled prior to reboot)
* 22:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: REIMAGE
* 20:28 tgr: [[phab:T304542|T304542]] running mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php hiwiki --verbose
* 20:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1184.eqiad.wmnet with reason: REIMAGE
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: REIMAGE
* 20:27 cjming: end of UTC late backport & config window
* 20:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 20:25 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/skins/Vector/resources: Backport: [[gerrit:790443{{!}}Factor out a separate scroll observer for the TOC A/B test, which should be fired separately from the page title observer used by the sticky header and TOC (T307952 T307345)]] (duration: 00m 52s)
* 20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: REIMAGE
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: REIMAGE
* 20:11 ejegg: updated payments-wiki from {{Gerrit|cc2612d6}} to {{Gerrit|8f46af9d}}
* 20:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:42 andrew@deploy1002: Finished deploy [horizon/deploy@17ea780]: display volume usage summaries (duration: 03m 34s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: REIMAGE
* 20:07 ejegg: updated payments-wiki from {{Gerrit|f06e390b}} to {{Gerrit|cc2612d6}}
* 20:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1180.eqiad.wmnet with reason: REIMAGE
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: REIMAGE
* 20:05 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790395{{!}}Release DiscussionTools new topic tool to former a/b test wikis (T307410)]] (duration: 00m 54s)
* 20:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: REIMAGE
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:39 andrew@deploy1002: Started deploy [horizon/deploy@17ea780]: display volume usage summaries
* 19:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: REIMAGE
* 19:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: REIMAGE
* 19:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: REIMAGE
* 19:19 rzl: Added new `scap` identity to keyholder on deploy[1002,2002] - [[phab:T307351|T307351]]
* 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2238.codfw.wmnet
* 18:06 razzi: razzi@lvs1020:~$ systemctl stop pybal.service to apply change https://gerrit.wikimedia.org/r/c/operations/puppet/+/779915
* 20:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2238.codfw.wmnet
* 15:53 robh: firmware upgrade for ganeti4001 complete [[phab:T307997|T307997]] (bios, nics, idrac) and manually confirmed first 10G port is link active (it is) and is set to pxe
* 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: REIMAGE
* 15:50 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4001.mgmt.ulsfo.wmnet with reboot policy FORCED
* 20:05 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: REIMAGE
* 15:49 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti4001.mgmt.ulsfo.wmnet with reboot policy FORCED
* 20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2237.codfw.wmnet
* 15:46 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@378e7ca]: (no justification provided) (duration: 00m 03s)
* 19:54 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2237.codfw.wmnet
* 15:46 ebysans@deploy1002: Started deploy [airflow-dags/analytics@378e7ca]: (no justification provided)
* 19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2236.codfw.wmnet
* 15:25 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@378e7ca]: (no justification provided) (duration: 00m 08s)
* 19:48 andrew@deploy1002: Finished deploy [horizon/deploy@3c2d1ee]: support VM resizing (duration: 03m 42s)
* 15:25 ebysans@deploy1002: Started deploy [airflow-dags/analytics@378e7ca]: (no justification provided)
* 19:44 andrew@deploy1002: Started deploy [horizon/deploy@3c2d1ee]: support VM resizing
* 15:15 robh: ganeti4001 updating all firmware revisions [[phab:T307997|T307997]]\
* 19:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2236.codfw.wmnet
* 15:15 robh: ganeti4001 updating all firmware revisions
* 19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2238.codfw.wmnet
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27789 and previous config saved to /var/cache/conftool/dbconfig/20220511-150038-marostegui.json
* 19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2237.codfw.wmnet
* 15:00 vgutierrez: pool ats-be on cp4032
* 19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2236.codfw.wmnet
* 14:58 moritzm: installing qemu security updates on bullseye
* 19:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2235.codfw.wmnet
* 14:51 vgutierrez: depool ats-be on cp4032
* 19:29 mutante: testreduce1001 - rebooted, fdisk /dev/sdb, create partition table, create primary partition, mkfs.ext4 /dev/vdb1
* 14:32 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2008.codfw.wmnet with OS buster
* 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2235.codfw.wmnet
* 14:22 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 19:18 andrew@deploy1002: Finished deploy [horizon/deploy@8967660]: clean up a reverted hack (duration: 03m 25s)
* 14:08 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 19:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2234.codfw.wmnet
* 13:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2008.codfw.wmnet with reason: host reimage
* 19:14 andrew@deploy1002: Started deploy [horizon/deploy@8967660]: clean up a reverted hack
* 13:55 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2008.codfw.wmnet with reason: host reimage
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.35 (duration: 01m 26s)
* 13:54 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 19:05 mutante: ganeti1011 - rebooting VM testreduce1001 on ganeti level for [[phab:T277580|T277580]]
* 13:30 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ores2008.codfw.wmnet with OS buster
* 19:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.35
* 13:25 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2007.codfw.wmnet with OS buster
* 19:02 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2234.codfw.wmnet
* 13:14 awight: EU backports complete
* 19:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2233.codfw.wmnet
* 13:13 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 18:58 catrope@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/: sessionTick: Tick right away on sessionReset ([[phab:T277515|T277515]]) (duration: 01m 10s)
* 13:11 awight@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: [[gerrit:790436{{!}}Fix incomplete FlaggedRevs::binaryFlagging() implementation (T307972)]] (duration: 00m 51s)
* 18:52 catrope@deploy1002: Synchronized php-1.36.0-wmf.35/vendor/: Bump wikimedia/parsoid to 0.13.0-a28 ([[phab:T276649|T276649]]) (duration: 01m 18s)
* 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2233.codfw.wmnet
* 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2235.codfw.wmnet
* 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2234.codfw.wmnet
* 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2233.codfw.wmnet
* 12:54 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2007.codfw.wmnet with reason: host reimage
* 18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2232.codfw.wmnet
* 12:50 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2007.codfw.wmnet with reason: host reimage
* 18:31 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Define Portal and Portal talk namespace for niawiki ([[phab:T277671|T277671]]) (duration: 01m 11s)
* 12:45 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27786 and previous config saved to /var/cache/conftool/dbconfig/20220511-124226-marostegui.json
* 18:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2232.codfw.wmnet
* 12:23 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ores2007.codfw.wmnet with OS buster
* 18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2231.codfw.wmnet
* 12:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2055.codfw.wmnet with OS bullseye
* 18:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2231.codfw.wmnet
* 12:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2230.codfw.wmnet
* 12:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:50 razzi: update firewall rules to allow mysql-sqoop in analytics-in4 to access clouddb1021 - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/672797
* 12:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:47 ejegg: updated payments-wiki from {{Gerrit|0405ea1723}} to {{Gerrit|b06009c099}}
* 11:56 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790997{{!}}Set dewiki to read new for templatelinks (T306673)]] (duration: 00m 49s)
* 17:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2230.codfw.wmnet
* 11:39 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
* 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:29 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 17:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 11:26 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2006.codfw.wmnet with OS buster
* 17:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27782 and previous config saved to /var/cache/conftool/dbconfig/20220511-105416-marostegui.json
* 17:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 10:54 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2006.codfw.wmnet with reason: host reimage
* 16:50 andrew@deploy1002: Finished deploy [horizon/deploy@8c50f27]: more support for disabled flavors (duration: 02m 32s)
* 10:48 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2006.codfw.wmnet with reason: host reimage
* 16:48 andrew@deploy1002: Started deploy [horizon/deploy@8c50f27]: more support for disabled flavors
* 10:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
* 16:45 andrew@deploy1002: Finished deploy [horizon/deploy@8c50f27]: more support for disabled flavors (duration: 00m 07s)
* 10:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
* 16:45 andrew@deploy1002: Started deploy [horizon/deploy@8c50f27]: more support for disabled flavors
* 10:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
* 16:44 andrew@deploy1002: Finished deploy [horizon/deploy@e4fd934]: more support for disabled flavors (duration: 00m 07s)
* 10:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
* 16:44 andrew@deploy1002: Started deploy [horizon/deploy@e4fd934]: more support for disabled flavors
* 10:31 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
* 16:38 effie: upgrade memcached on mc1025, mc2025
* 10:31 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
* 16:06 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.35
* 10:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
* 16:04 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/RevisionRecord.php: (no justification provided) (duration: 00m 58s)
* 10:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
* 15:54 ejegg: updated standalone SmashPig deployment from {{Gerrit|58b070db1a}} to {{Gerrit|250a8570d1}}
* 10:25 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti4001.ulsfo.wmnet with OS bullseye
* 15:23 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dbmonitor1002.wikimedia.org
* 10:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2055.codfw.wmnet with reason: host reimage
* 14:56 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host dbmonitor1002.wikimedia.org
* 10:21 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2055.codfw.wmnet with reason: host reimage
* 14:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
* 10:21 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ores2006.codfw.wmnet with OS buster
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14935 and previous config saved to /var/cache/conftool/dbconfig/20210317-142532-root.json
* 10:16 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 14:18 jayme: rebooting restreduce1001 for [[phab:T277580|T277580]]
* 10:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
* 14:17 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
* 10:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14934 and previous config saved to /var/cache/conftool/dbconfig/20210317-141028-root.json
* 10:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
* 14:02 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=sessionstore
* 10:08 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 14:02 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-analytics
* 10:06 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
* 14:01 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2f1b28] (duration: 04m 19s)
* 10:06 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 13:59 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
* 10:01 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1004.eqiad.wmnet
* 13:58 moritzm: added bullseye tftpboot environment [[phab:T275873|T275873]]
* 10:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 13:56 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2f1b28]
* 09:57 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye
* 13:56 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28] (thin): Regular analytics weekly train THIN [analytics/refinery@d2f1b28] (duration: 00m 06s)
* 09:56 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1004.eqiad.wmnet
* 13:56 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28] (thin): Regular analytics weekly train THIN [analytics/refinery@d2f1b28]
* 09:54 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14933 and previous config saved to /var/cache/conftool/dbconfig/20210317-135522-root.json
* 09:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
* 13:54 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
* 09:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
* 13:52 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
* 09:41 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2002.codfw.wmnet
* 13:52 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28]: Regular analytics weekly train [analytics/refinery@d2f1b28] (duration: 11m 36s)
* 09:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
* 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-analytics-external
* 09:35 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
* 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-logging-external
* 09:35 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2002.codfw.wmnet
* 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=api-gateway
* 09:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for registry2003.codfw.wmnet
* 13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=echostore
* 09:34 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for registry2003.codfw.wmnet
* 13:47 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
* 09:27 jayme: systemctl reset-failed ifup@ens5.service on registry2003 - [[phab:T273026|T273026]]
* 13:46 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
* 09:27 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2001.codfw.wmnet
* 13:41 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
* 09:24 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2055.codfw.wmnet with OS bullseye
* 13:40 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28]: Regular analytics weekly train [analytics/refinery@d2f1b28]
* 09:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14932 and previous config saved to /var/cache/conftool/dbconfig/20210317-134018-root.json
* 09:18 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup2001.codfw.wmnet
* 13:38 kormat: stopping db2137:s5 [[phab:T277632|T277632]]
* 09:15 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 13:33 kormat: stopping db2089:s5 [[phab:T277632|T277632]]
* 09:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 13:31 otto@deploy1002: Finished deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - [[phab:T207171|T207171]], [[phab:T263697|T263697]] (duration: 03m 24s)
* 09:07 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
* 13:27 otto@deploy1002: Started deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - [[phab:T207171|T207171]], [[phab:T263697|T263697]]
* 09:06 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
* 13:23 jynus: stopping s5 instance on db2099 and restoring from backup [[phab:T277632|T277632]]
* 09:06 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
* 13:17 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventstreams
* 09:05 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
* 13:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventstreams-internal
* 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mobileapps
* 09:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
* 13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=wikifeeds
* 08:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
* 13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=termbox
* 08:50 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ores2009.codfw.wmnet with OS buster
* 13:12 moritzm: installing tiff security updates
* 08:46 moritzm: logging an example as part of Simon's omboarding
* 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=similar-users
* 08:40 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=push-notifications
* 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2009.codfw.wmnet with reason: host reimage
* 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=proton
* 08:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2054.codfw.wmnet with OS bullseye
* 12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=linkrecommendation
* 08:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2009.codfw.wmnet with reason: host reimage
* 12:44 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=blubberoid
* 08:12 marostegui: Rename revision_actor_temp on db1132 (s1) and db1114 (s8) [[phab:T307906|T307906]]
* 12:44 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=apertium
* 08:04 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2054.codfw.wmnet with reason: host reimage
* 12:11 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
* 08:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4004.ulsfo.wmnet with OS bullseye
* 12:10 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-main
* 08:00 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2054.codfw.wmnet with reason: host reimage
* 11:49 marostegui: Deploy schema change on s8, lag will appear on wiki replicas [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 07:51 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ores2009.codfw.wmnet with OS buster
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P14931 and previous config saved to /var/cache/conftool/dbconfig/20210317-114746-marostegui.json
* 07:47 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: host reimage
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14930 and previous config saved to /var/cache/conftool/dbconfig/20210317-114601-root.json
* 07:46 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2054.codfw.wmnet with OS bullseye
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14929 and previous config saved to /var/cache/conftool/dbconfig/20210317-113057-root.json
* 07:44 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: host reimage
* 11:20 jayme: switch restbase-async back to codfw (the newly initialized cluster)
* 07:22 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti4004.ulsfo.wmnet with OS bullseye
* 11:17 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
* 07:18 moritzm: drain ganeti4001 [[phab:T307997|T307997]]
* 11:17 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
* 07:05 moritzm: updating ganeti4* to Ganeti 3.0.1-1~bpo10+1 [[phab:T307997|T307997]]
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14928 and previous config saved to /var/cache/conftool/dbconfig/20210317-111553-root.json
* 06:40 marostegui: db2146 set global innodb_max_dirty_pages_pct = 75; [[phab:T307082|T307082]]
* 11:09 moritzm: restarting tomcat on idp.wikimedia.org
* 06:31 Amir1: mwscript maintenance/refreshImageMetadata.php --wiki=commonswiki --force --verbose --mediatype=AUDIO --mime audio/webm ([[phab:T226311|T226311]])
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14927 and previous config saved to /var/cache/conftool/dbconfig/20210317-110050-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27780 and previous config saved to /var/cache/conftool/dbconfig/20220511-053418-marostegui.json
* 09:59 moritzm: imported PHP 5.6.40 to thirdparty/php56 [[phab:T224589|T224589]]
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2146 [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P27779 and previous config saved to /var/cache/conftool/dbconfig/20220511-051703-marostegui.json
* 09:47 vgutierrez: restart varnish-fe on cp5011
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2146 [[phab:T301879|T301879]]', diff saved to https://phabricator.wikimedia.org/P27778 and previous config saved to /var/cache/conftool/dbconfig/20220511-051307-marostegui.json
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 for schema change', diff saved to https://phabricator.wikimedia.org/P14926 and previous config saved to /var/cache/conftool/dbconfig/20210317-092443-marostegui.json
* 01:41 mutante: gitlab2001 - starting backup-restore service that had failed on previous automatic run
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14925 and previous config saved to /var/cache/conftool/dbconfig/20210317-092357-root.json
* 01:33 ejegg: updated payments-wiki from {{Gerrit|c5be9c5d}} to {{Gerrit|f06e390b}}
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14924 and previous config saved to /var/cache/conftool/dbconfig/20210317-090853-root.json
* 09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=recommendation-api
* 09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=cxserver
* 09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14923 and previous config saved to /var/cache/conftool/dbconfig/20210317-090108-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 [[phab:T276302|T276302]]', diff saved to https://phabricator.wikimedia.org/P14922 and previous config saved to /var/cache/conftool/dbconfig/20210317-085852-marostegui.json
* 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14921 and previous config saved to /var/cache/conftool/dbconfig/20210317-085350-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14920 and previous config saved to /var/cache/conftool/dbconfig/20210317-084605-root.json
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14919 and previous config saved to /var/cache/conftool/dbconfig/20210317-083846-root.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14918 and previous config saved to /var/cache/conftool/dbconfig/20210317-083101-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14917 and previous config saved to /var/cache/conftool/dbconfig/20210317-081557-root.json
* 07:50 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - [[phab:T272836|T272836]]
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for schema change', diff saved to https://phabricator.wikimedia.org/P14916 and previous config saved to /var/cache/conftool/dbconfig/20210317-073403-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14915 and previous config saved to /var/cache/conftool/dbconfig/20210317-073024-root.json
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14914 and previous config saved to /var/cache/conftool/dbconfig/20210317-071520-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14913 and previous config saved to /var/cache/conftool/dbconfig/20210317-070017-root.json
* 06:52 marostegui: Stop MySQL on db1082 to clone db1161 (lag will appear on s5 on wikireplicas) - [[phab:T258361|T258361]]
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 to clone db1161 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14911 and previous config saved to /var/cache/conftool/dbconfig/20210317-065146-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2150 into s7 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14910 and previous config saved to /var/cache/conftool/dbconfig/20210317-064606-marostegui.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14909 and previous config saved to /var/cache/conftool/dbconfig/20210317-064513-root.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2150 to s7, depooled [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14908 and previous config saved to /var/cache/conftool/dbconfig/20210317-060358-marostegui.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P14907 and previous config saved to /var/cache/conftool/dbconfig/20210317-054206-marostegui.json
* 02:25 eileen: civicrm revision changed from {{Gerrit|8c137b94f0}} to {{Gerrit|99bf1c9210}}, config revision is {{Gerrit|ef2767ab91}}
* 01:55 eileen: civicrm revision changed from {{Gerrit|550be50105}} to {{Gerrit|8c137b94f0}}, config revision is {{Gerrit|ef2767ab91}}


== 2021-03-16 ==
== 2022-05-10 ==
* 23:56 krinkle@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/: {{Gerrit|I8619ab9e92b}}, [[phab:T277362|T277362]], [[phab:T275531|T275531]] (duration: 00m 58s)
* 20:13 mforns@deploy1002: Finished deploy [analytics/refinery@d2dfced] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2dfced] (duration: 06m 59s)
* 23:51 krinkle@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/Scribunto/: {{Gerrit|I84e8732d8d}} - tmp logging (duration: 00m 58s)
* 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1003.wikimedia.org
* 23:47 Krinkle: There is an uncommitted dirty diff in /srv/mediawiki-staging/php-1.36.0-wmf.34/extensions/WikimediaMaintenance/createExtensionTables.php
* 20:06 mforns@deploy1002: Started deploy [analytics/refinery@d2dfced] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2dfced]
* 23:31 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I1ca4f30c2}}, [[phab:T262612|T262612]] (duration: 00m 57s)
* 20:05 mforns@deploy1002: Finished deploy [analytics/refinery@d2dfced] (thin): Regular analytics weekly train THIN [analytics/refinery@d2dfced] (duration: 00m 07s)
* 23:22 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Icd6635cb302cc}}, [[phab:T277332|T277332]] (duration: 00m 58s)
* 20:05 mforns@deploy1002: Started deploy [analytics/refinery@d2dfced] (thin): Regular analytics weekly train THIN [analytics/refinery@d2dfced]
* 23:07 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I8d8c94d95c6}} (duration: 00m 59s)
* 20:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1004.wikimedia.org
* 23:03 twentyafterfour: applied hotfix to phabricator/src/infrastructure/customfield/storage/PhabricatorCustomFieldStorage.php and restarted php-fpm
* 19:55 andrew@cumin1001: START -
* 23:02 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I4097cbcb1d5}} (duration: 00m 59s)
* 22:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Ie24eb2077}


== 2021-03-15 ==
== 2022-05-09 ==
* 23:31 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove back-compat from when IRC feed servers was a string ([[phab:T224579|T224579]]) (duration: 00m 59s)
* 21:58 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: new kernel round deux
* 23:24 legoktm@deploy1002: Synchronized wmf-config/: Define IRC feed servers as an array in <nowiki>{</nowiki>Production,Labs<nowiki>}</nowiki>Services.php ([[phab:T224579|T224579]]) (duration: 00m 59s)
* 21:58 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: new kernel round deux
* 23:23 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Support having multiple IRC feed servers ([[phab:T224579|T224579]]) (duration: 00m 58s)
* 21:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: new kernel, round deux
* 23:13 legoktm@deploy1002: conftool action : set/pooled=inactive; selector: name=mw2225.codfw.wmnet
* 21:56 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: new kernel, round deux
* 23:11 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: GlobalWatchlist: allow watching up to 50 sites ([[phab:T276195|T276195]]) (duration: 01m 04s)
* 21:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2239.codfw.wmnet
* 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2238.codfw.wmnet
* 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2237.codfw.wmnet
* 21:19 cjming: end of UTC late backport & config window
* 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2236.codfw.wmnet
* 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:02 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@4300929]: convert_to_esbulk: Accept partial hour timestamps (duration: 03m 02s)
* 21:18 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/GrowthExperiments: Backport: [[gerrit:790406{{!}}Newcomer tasks: deploy AND topic selection to pilot wikis (T305399)]] (duration: 00m 54s)
* 20:59 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@4300929]: convert_to_esbulk: Accept partial hour timestamps
* 21:14 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/CampaignConfig.php: Backport: [[gerrit:790336{{!}}CampaignConfig: Avoid array_push() error]] (duration: 00m 51s)
* 20:55 legoktm: re-enabled puppet on kubestage2001, uncordoned kubestage2002
* 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2225.codfw.wmnet
* 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:57 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@82e0654]: prepare_mw_rev_score: Correct scores_export to bulk_ingest (duration: 01m 49s)
* 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:55 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@82e0654]: prepare_mw_rev_score: Correct scores_export to bulk_ingest
* 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2225.codfw.wmnet
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:53 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mw2224.codfw.wmnet
* 21:02 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/skins/Vector/resources: Backport: [[gerrit:790426{{!}}Adjust table of contents margins at 1000-1200 breakpoint (T307004)]] (duration: 00m 53s)
* 19:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2224.codfw.wmnet
* 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:43 eevans@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:37 eevans@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:27 eevans@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 20:36 cjming@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:790408{{!}}cirrus: Enable DeprecationLoggedHttps (T218994)]] (duration: 00m 51s)
* 18:56 dduvall@deploy1002: Synchronized .pipeline: config: [[gerrit:666492{{!}}Initial multiversion pipeline configuration]] [[gerrit:669807{{!}}pipeline: add building the webserver image]] ([[phab:T274182|T274182]]) (duration: 00m 59s)
* 20:32 cjming@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/Kartographer/modules/box: Backport: [[gerrit:790329{{!}}Refresh MediaWiki globals when loading mapdata (T307650)]] (duration: 00m 52s)
* 18:55 dduvall@deploy1002: Synchronized multiversion/: config: [[gerrit:666492{{!}}Initial multiversion pipeline configuration]] [[gerrit:669807{{!}}pipeline: add building the webserver image]] ([[phab:T274182|T274182]]) (duration: 00m 59s)
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e5a7284956e707ace94120e8224b262d5ef56c99}}: Enable DiscussionsTools for enwikibooks ([[phab:T276851|T276851]]) (duration: 00m 59s)
* 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:41 legoktm: puppet disabled on kubestage1001 for debugging docker-registry credentials
* 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:38 urbanecm@deploy1002: Synchronized wmf-config/config/enwikibooks.yaml: {{Gerrit|b6a8df04701f9a83643c93342183b448705477bd}}: Enable visualeditor on enwikibooks by default ([[phab:T276851|T276851]]; 2/2) (duration: 01m 00s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:37 foks: removing 1 file from eowiki, for legal compliance
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:35 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|b6a8df04701f9a83643c93342183b448705477bd}}: Enable visualeditor on enwikibooks by default ([[phab:T276851|T276851]]; 1/2) (duration: 00m 58s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b70a75c7530f4bc71fbb88b859329edb6dadf2a0}}: Configure default search namespaces for thwikisource ([[phab:T275280|T275280]]) (duration: 00m 59s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:18 hoo: Updated the Wikidata property suggester with data from the 2021-03-08 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:17 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: {{Gerrit|a7eb550498fd038fbc5d96d8a82a64c2ee5eb57a}}: Use master version of clientError.js (duration: 00m 58s)
* 19:25 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum6002.drmrs.wmnet
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a8234a9435a3acf669d44705fbcb19bf4dd5658e}}: Add deleterevision right to botadmin group on fawiki ([[phab:T277358|T277358]]) (duration: 00m 59s)
* 19:17 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum6002.drmrs.wmnet
* 18:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2223.codfw.wmnet
* 19:17 sukhe: depool durum6002.drmrs.wmnet (as part of [[phab:T307427|T307427]])
* 18:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2235.codfw.wmnet
* 19:11 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum6001.drmrs.wmnet
* 18:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2234.codfw.wmnet
* 19:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum6001.drmrs.wmnet
* 17:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2223.codfw.wmnet
* 19:04 sukhe: depool durum6001.drmrs.wmnet (as part of [[phab:T307427|T307427]])
* 17:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2222.codfw.wmnet
* 18:13 mutante: rebooting mwmaint2002 (not active maint server)
* 17:30 hnowlan: disabling puppet on aqs100[4-9].eqiad.wmnet to test change to password logic in puppet
* 18:13 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reboot
* 17:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2222.codfw.wmnet
* 18:13 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reboot
* 17:29 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2223.codfw.wmnet
* 18:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: reboot
* 17:29 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2222.codfw.wmnet
* 18:06 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: reboot
* 17:29 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2221.codfw.wmnet
* 18:05 mutante: etherpad - maintenance reboot - expect a short downtime
* 17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2221.codfw.wmnet
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:03 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2221.codfw.wmnet
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:58 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2224.codfw.wmnet
* 17:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:58 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2220.codfw.wmnet
* 17:22 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:790345{{!}}Bumping portals to master (T304629)]] (duration: 00m 50s)
* 16:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2220.codfw.wmnet
* 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2224.codfw.wmnet
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:48 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2224.codfw.wmnet
* 17:22 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:790345{{!}}Bumping portals to master (T304629)]] (duration: 00m 52s)
* 16:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2220.codfw.wmnet
* 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2233.codfw.wmnet
* 17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2232.codfw.wmnet
* 17:10 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2231.codfw.wmnet
* 16:49 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:29 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet
* 16:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet
* 16:14 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1003.eqiad.wmnet
* 16:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
* 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh6002.wikimedia.org
* 16:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
* 16:10 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1003.eqiad.wmnet
* 16:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
* 16:07 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad on elastic1049 to resolve CirrusSearchJVMGCOldPoolFlatlined
* 16:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
* 16:03 sukhe: depool doh6002 (as part of [[phab:T307427|T307427]])
* 16:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
* 16:02 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh6002.wikimedia.org
* 16:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
* 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:05 moritzm: draining ganeti2010
* 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
* 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
* 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:48 moritzm: draining ganeti2009
* 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh6001.wikimedia.org
* 15:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2007.codfw.wmnet
* 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2007.codfw.wmnet
* 15:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:33 moritzm: draining ganeti2007
* 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:27 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: REIMAGE
* 15:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:24 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: REIMAGE
* 15:35 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh6001.wikimedia.org
* 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14858 and previous config saved to /var/cache/conftool/dbconfig/20210315-151648-root.json
* 15:35 sukhe@cumin2002: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh6001.wikimedia.org
* 15:16 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 15:35 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh6001.wikimedia.org
* 15:14 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 15:34 sukhe: depool doh6001 (as part of [[phab:T307427|T307427]])
* 15:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 14:05 taavi: UTC afternoon backport window done
* 15:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14857 and previous config saved to /var/cache/conftool/dbconfig/20210315-150144-root.json
* 14:04 taavi@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/ContentTranslation/app: Backport: [[gerrit:790328{{!}}CX3 Build 0.2.0+20220509 (T306643)]] (duration: 00m 51s)
* 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14856 and previous config saved to /var/cache/conftool/dbconfig/20210315-144641-root.json
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:36 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:36 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14855 and previous config saved to /var/cache/conftool/dbconfig/20210315-143137-root.json
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:28 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: old kernel :(
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P14854 and previous config saved to /var/cache/conftool/dbconfig/20210315-140809-marostegui.json
* 13:56 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: old kernel :(
* 14:04 dcausse: re-pooling wdqs1005
* 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14853 and previous config saved to /var/cache/conftool/dbconfig/20210315-135426-root.json
* 13:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: old kernel :(
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14852 and previous config saved to /var/cache/conftool/dbconfig/20210315-133921-root.json
* 13:52 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: old kernel :(
* 13:25 Urbanecm: Deploy security patch for [[phab:T152394|T152394]]
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14851 and previous config saved to /var/cache/conftool/dbconfig/20210315-132418-root.json
* 13:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti-test2001.codfw.wmnet
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14849 and previous config saved to /var/cache/conftool/dbconfig/20210315-130914-root.json
* 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14848 and previous config saved to /var/cache/conftool/dbconfig/20210315-123930-marostegui.json
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:32 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/MobileFrontend/: {{Gerrit|41a2aaac8c7b6ee5ec05af6d051d541614eaba30}}: Revert "Rewite MoveLeadParagraphTransform based on mobile apps approach" ([[phab:T277302|T277302]]) (duration: 00m 58s)
* 13:48 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:780874{{!}}Newcomer tasks: deploy AND topic selection to pilot wikis (T305399)]] (duration: 00m 49s)
* 12:31 Lucas_WMDE: maintenance scripts for [[phab:T270249|T270249]] completed successfully, no more terms for deleted items found on stat1007
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:30 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/GrowthExperiments/: {{Gerrit|fa2abfab23c7030402336f8908d0988f37d8133b}}: Manual submodule update of GrowthExperiments repository ([[phab:T276966|T276966]]) (duration: 00m 59s)
* 13:41 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host centrallog2002.codfw.wmnet
* 12:29 Lucas_WMDE: RemoveDeletedItemsFromTermStore.php finished in 5m39s
* 13:41 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
* 12:23 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 5555,9593p [[phab:T270249|T270249]].ids {{!}} tr '\n' ',' {{!}} sed 's/,$//')" # [[phab:T270249|T270249]], remaining 4039 items
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:22 Lucas_WMDE: RemoveDeletedItemsFromTermStore.php finished in 8min
* 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 12:19 _joe_: depooled mw1347 for testing
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:13 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 555,5554p [[phab:T270249|T270249]].ids {{!}} tr '\n' ',' {{!}} sed 's/,$//')" # [[phab:T270249|T270249]], 5000 items
* 13:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
* 12:12 Lucas_WMDE: finished in 43s
* 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:11 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 55,554p [[phab:T270249|T270249]].ids {{!}} tr '\n' ',' {{!}} sed 's/,$//')" # [[phab:T270249|T270249]], 500 items
* 13:37 taavi@deploy1002: Synchronized php-1.39.0-wmf.10/extensions/ContentTranslation/modules/entrypoints: Backport: [[gerrit:789832{{!}}ULS entrypoint: Do not show current language, fix domain redirects (T307745 T298032)]] (duration: 00m 50s)
* 12:10 Lucas_WMDE: finished in 5.1s
* 13:36 taavi@deploy1002: Synchronized docroot/wwwportal/w/search-redirect.php: Config: [[gerrit:789972{{!}}search-redirect.php: Make sure the family is lowercased (T304629)]] (duration: 00m 51s)
* 12:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 5,54p [[phab:T270249|T270249]].ids {{!}} tr '\n' ',' {{!}} sed 's/,$//')" # [[phab:T270249|T270249]], 50 items
* 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14847 and previous config saved to /var/cache/conftool/dbconfig/20210315-115826-root.json
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:672371{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:50 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:672371{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14846 and previous config saved to /var/cache/conftool/dbconfig/20210315-114323-root.json
* 13:29 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:789974{{!}}rowiki: Fix canonical namespaces (T127607)]] (duration: 00m 51s)
* 11:34 moritzm: restarting FPM on mw canaries to pick up new libtiff
* 13:26 moritzm: failover ganeti master in codfw/test to ganeti-test2003
* 11:30 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:28 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14844 and previous config saved to /var/cache/conftool/dbconfig/20210315-112819-root.json
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:22 moritzm: installing tiff security updates
* 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
* 11:17 moritzm: installing golang-1.7 security updates
* 13:21 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:789889{{!}}ptwiki: Revoke 500KB uploading limitation (T307813)]] (duration: 00m 50s)
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14843 and previous config saved to /var/cache/conftool/dbconfig/20210315-111315-root.json
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:00 volans: upgraded spicerack on cumin1001 to 0.0.49-1+deb10u1
* 13:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P14842 and previous config saved to /var/cache/conftool/dbconfig/20210315-105855-marostegui.json
* 13:16 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:788777{{!}}Set log level to 'debug' for mediamoderation (T303312)]] (duration: 00m 50s)
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14841 and previous config saved to /var/cache/conftool/dbconfig/20210315-105820-root.json
* 13:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 10:56 volans@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2001.codfw.wmnet with reason: test
* 12:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 10:55 volans@cumin2001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2001.codfw.wmnet with reason: test
* 12:54 moritzm: installing perf updates on stretch/buster hosts
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14840 and previous config saved to /var/cache/conftool/dbconfig/20210315-104316-root.json
* 12:46 moritzm: installing perf updates on bullseye hosts
* 10:42 moritzm: installing pygments security updates on buster
* 12:45 klausman@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 07s)
* 10:33 volans: upgraded spicerack on cumin2001 to 0.0.49-1+deb10u1
* 12:45 klausman@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14839 and previous config saved to /var/cache/conftool/dbconfig/20210315-102813-root.json
* 12:45 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores2002.codfw.wmnet with OS buster
* 10:26 kormat@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: schema change [[phab:T267767|T267767]]', diff saved to https://phabricator.wikimedia.org/P14838 and previous config saved to /var/cache/conftool/dbconfig/20210315-102648-kormat.json
* 12:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2012.codfw.wmnet
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14837 and previous config saved to /var/cache/conftool/dbconfig/20210315-101309-root.json
* 12:37 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2012.codfw.wmnet
* 10:11 kormat@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: schema change [[phab:T267767|T267767]]', diff saved to https://phabricator.wikimedia.org/P14836 and previous config saved to /var/cache/conftool/dbconfig/20210315-101143-kormat.json
* 12:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2011.codfw.wmnet
* 10:03 kormat@cumin1001: dbctl commit (dc=all): 'db1114 depooling: schema change [[phab:T267767|T267767]]', diff saved to https://phabricator.wikimedia.org/P14835 and previous config saved to /var/cache/conftool/dbconfig/20210315-100337-kormat.json
* 12:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2011.codfw.wmnet
* 10:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1114.eqiad.wmnet with reason: schema change [[phab:T267767|T267767]]
* 12:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2010.codfw.wmnet
* 10:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1114.eqiad.wmnet with reason: schema change [[phab:T267767|T267767]]
* 12:19 godog: depool thanos-fe1001 to test load theory wrt account-stats failures - [[phab:T307907|T307907]]
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P14834 and previous config saved to /var/cache/conftool/dbconfig/20210315-095607-marostegui.json
* 12:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2010.codfw.wmnet
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14833 and previous config saved to /var/cache/conftool/dbconfig/20210315-094920-root.json
* 12:14 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores2002.codfw.wmnet with reason: host reimage
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14832 and previous config saved to /var/cache/conftool/dbconfig/20210315-093416-root.json
* 12:10 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ores2002.codfw.wmnet with reason: host reimage
* 09:23 vgutierrez: rolling restart of LVS cluster to bump depool_threshold to 0.8 on text & upload clusters - [[phab:T274888|T274888]]
* 12:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1012.eqiad.wmnet
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14831 and previous config saved to /var/cache/conftool/dbconfig/20210315-091912-root.json
* 11:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1012.eqiad.wmnet
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14830 and previous config saved to /var/cache/conftool/dbconfig/20210315-090409-root.json
* 11:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1011.eqiad.wmnet
* 08:54 marostegui: Stop MySQL on db1136 [[phab:T277007|T277007]]
* 11:53 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1011.eqiad.wmnet
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 [[phab:T277007|T277007]]', diff saved to https://phabricator.wikimedia.org/P14829 and previous config saved to /var/cache/conftool/dbconfig/20210315-085409-marostegui.json
* 11:45 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ores2002.codfw.wmnet with OS buster
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14828 and previous config saved to /var/cache/conftool/dbconfig/20210315-083555-marostegui.json
* 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir6002.drmrs.wmnet
* 08:33 godog: swift eqiad-prod remove decom hosts from account/container rings - [[phab:T272836|T272836]] [[phab:T276193|T276193]]
* 11:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir6002.drmrs.wmnet
* 08:33 marostegui: Repool labsdb1009 [[phab:T276980|T276980]]
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir6001.drmrs.wmnet
* 07:22 elukey: powercycle ms-be1038 - no ssh, no tty available in mgmt serial console, irrecoverable error saved in ilo's system logs
* 11:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1010.eqiad.wmnet
* 11:12 _joe_: removing stale files from config-master on puppetmaster2001; this could cause some flapping confd alerts
* 11:11 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1010.eqiad.wmnet
* 11:10 _joe_: removing stale files from config-master on puppetmaster1001; this could cause some flapping confd alerts
* 11:07 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-fe1010.eqiad.wmnet
* 11:07 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1010.eqiad.wmnet
* 11:05 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir6001.drmrs.wmnet
* 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus6001.drmrs.wmnet
* 10:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM prometheus6001.drmrs.wmnet
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6001.wikimedia.org
* 10:48 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6001.wikimedia.org
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow6001.drmrs.wmnet
* 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow6001.drmrs.wmnet
* 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install6001.wikimedia.org
* 10:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install6001.wikimedia.org
* 10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2052.codfw.wmnet with OS bullseye
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir3002.esams.wmnet
* 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir3002.esams.wmnet
* 09:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2052.codfw.wmnet with reason: host reimage
* 09:52 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2052.codfw.wmnet with reason: host reimage
* 09:42 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 05s)
* 09:42 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
* 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir3001.esams.wmnet
* 09:38 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 32s)
* 09:38 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
* 09:37 elukey@deploy1002: Finished deploy [ores/deploy@98a1b2e]: (no justification provided) (duration: 00m 08s)
* 09:36 elukey@deploy1002: Started deploy [ores/deploy@98a1b2e]: (no justification provided)
* 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir3001.esams.wmnet
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27768 and previous config saved to /var/cache/conftool/dbconfig/20220509-093032-marostegui.json
* 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ping3002.esams.wmnet
* 09:25 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2052.codfw.wmnet with OS bullseye
* 09:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ping3002.esams.wmnet
* 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus3001.esams.wmnet
* 09:09 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM prometheus3001.esams.wmnet
* 08:53 jelto: mw241[2-9]: scap pull
* 08:51 hashar: Gerrit is back and operational
* 08:47 hashar: Restarting Gerrit for plugin update
* 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus5001.eqsin.wmnet
* 08:43 hashar@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: Update Zuul plugin - [[phab:T307621|T307621]] (duration: 00m 07s)
* 08:43 hashar@deploy1002: Started deploy [gerrit/gerrit@94c5028]: Update Zuul plugin - [[phab:T307621|T307621]]
* 08:42 hashar: Restarting Gerrit on replica gerrit2001.wikimedia.org to update the Zuul plugin # [[phab:T307621|T307621]]
* 08:41 hashar@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: Update Zuul plugin - [[phab:T307621|T307621]] (duration: 00m 09s)
* 08:41 hashar@deploy1002: Started deploy [gerrit/gerrit@94c5028]: Update Zuul plugin - [[phab:T307621|T307621]]
* 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM prometheus5001.eqsin.wmnet
* 08:41 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=elastic2033.codfw.wmnet
* 08:40 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=ores2002.codfw.wmnet
* 08:40 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw2412.codfw.wmnet
* 08:30 dcausse: restarting blazegraph on wdqs1004 (BlazegraphFreeAllocatorsDecreasingRapidly)
* 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus4001.ulsfo.wmnet
* 08:22 Amir1: restarting confd on puppetmaster100[12]
* 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM prometheus4001.ulsfo.wmnet
* 08:09 godog: temp stop tegola-swift-container delete - [[phab:T307184|T307184]]
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27765 and previous config saved to /var/cache/conftool/dbconfig/20220509-080521-marostegui.json
* 08:03 ladsgroup@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1415.eqiad.wmnet
* 08:02 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1415.eqiad.wmnet
* 07:51 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw
* 07:50 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api-appserver,dc=codfw
* 07:37 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790016{{!}}Fix display issue of Timeline in cdo, gan, hak, wuu, yue and zh_classical (T188997)]] (duration: 05m 13s)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1172 to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27764 and previous config saved to /var/cache/conftool/dbconfig/20220509-073200-marostegui.json
* 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:20 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 07:16 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1172 with minimal weight to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27763 and previous config saved to /var/cache/conftool/dbconfig/20220509-070430-marostegui.json
* 06:23 Amir1: start of updateRestrictions.php on s5 ([[phab:T218446|T218446]])
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1172 with minimal weight to test 10.6 [[phab:T307546|T307546]]', diff saved to https://phabricator.wikimedia.org/P27762 and previous config saved to /var/cache/conftool/dbconfig/20220509-054823-marostegui.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172', diff saved to https://phabricator.wikimedia.org/P27761 and previous config saved to /var/cache/conftool/dbconfig/20220509-051426-marostegui.json
* 05:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 05:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:53 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: [[gerrit:790021{{!}}Set arwiki to read new in templatelinks migration (T306673)]] (duration: 05m 03s)
* 04:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:47 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 05m 04s)
* 04:40 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790020{{!}}Stop writing to rev_actor_temp table in group1 (T275246)]] (duration: 05m 06s)
* 04:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 04:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 04:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 04:31 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790021{{!}}Set arwiki to read new in templatelinks migration (T306673)]] (duration: 05m 10s)


== 2021-03-14 ==
== 2022-05-08 ==
* 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14827 and previous config saved to /var/cache/conftool/dbconfig/20210314-175751-root.json
* 07:16 godog: silence probedown for thumbor:8800 until monday
* 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14826 and previous config saved to /var/cache/conftool/dbconfig/20210314-174248-root.json
* 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14825 and previous config saved to /var/cache/conftool/dbconfig/20210314-172744-root.json
* 17:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14824 and previous config saved to /var/cache/conftool/dbconfig/20210314-171240-root.json
* 14:43 gehel: depool wdqs1005 and restart blazegraph - will keep depooled until this server has catched up on lag


== 2021-03-13 ==
== 2022-05-07 ==
* 19:02 Amir1: change default charset of all core tables in labstestwiki to binary ([[phab:T269348|T269348]])
* 21:29 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: seeking consistency between codfw1dev and eqiad1 (duration: 04m 04s)
* 18:53 Amir1: run schema changes for varbinary on wikitech ([[phab:T269348|T269348]])
* 21:25 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: seeking consistency between codfw1dev and eqiad1
* 17:38 twentyafterfour: restarted apache on gerrit1001 to resolve apache worker exhaustion see [[phab:T277127|T277127]]
* 21:11 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: seeking consistency between codfw1dev and eqiad1 (duration: 05m 51s)
* 16:57 Reedy: gerrit web interface is slow/timing out
* 21:05 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: seeking consistency between codfw1dev and eqiad1
* 01:18 ryankemper: [[phab:T266470|T266470]] Re-enabled icinga service notifications for `Check no envoy runtime configuration is left persistent` on `wdqs100[9,10]`
* 15:53 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): testing scapping to cloudweb2002 (duration: 01m 14s)
* 01:04 ryankemper: [[phab:T266470|T266470]] merged https://gerrit.wikimedia.org/r/c/operations/dns/+/668255 && `ryankemper@authdns1001:~$ sudo authdns-update`
* 15:52 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): testing scapping to cloudweb2002
* 00:55 mutante: [wdqs1009:/etc/envoy] $ sudo /usr/local/sbin/build-envoy-config -c /etc/envoy/
* 15:49 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): (no justification provided) (duration: 00m 17s)
* 15:49 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): (no justification provided)
* 15:49 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): (no justification provided) (duration: 00m 05s)
* 15:48 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): (no justification provided)
* 15:38 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): testing scapping to cloudweb2002 (duration: 00m 33s)
* 15:38 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): testing scapping to cloudweb2002
* 15:29 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: seeking consistency between codfw1dev and eqiad1 (duration: 10m 55s)
* 15:18 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: seeking consistency between codfw1dev and eqiad1
* 15:17 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: seeking consistency between codfw1dev and eqiad1 (duration: 10m 55s)
* 15:06 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: seeking consistency between codfw1dev and eqiad1
* 04:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1016.eqiad.wmnet
* 04:13 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 04:07 andrew@cumin1001: START - Cookbook sre.dns.netbox
* 04:03 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1016.eqiad.wmnet


== 2021-03-12 ==
== 2022-05-06 ==
* 22:53 ryankemper: [[phab:T266470|T266470]] Manually disabled service notifications for `Check no envoy runtime configuration is left persistent`, will need to circle back on Monday to restore notifications
* 19:16 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
* 22:10 legoktm: imported mailman-puppetmaster.mailman.eqiad1.wikimedia.cloud facts to puppet-compiler
* 19:11 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
* 21:52 mutante: puppetmaster1001  sudo puppet cert clean testreduce.discovery.wmnet ([[phab:T266509|T266509]])
* 19:02 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1001.eqiad.wmnet
* 21:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2219.codfw.wmnet
* 18:56 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1001.eqiad.wmnet
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2219.codfw.wmnet
* 18:39 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2218.codfw.wmnet
* 18:28 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
* 20:32 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2218.codfw.wmnet
* 18:24 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
* 20:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2217.codfw.wmnet
* 18:18 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
* 20:22 eevans@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 18:16 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1009.eqiad.wmnet
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2217.codfw.wmnet
* 18:12 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet
* 20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2219.codfw.wmnet
* 18:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
* 20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2218.codfw.wmnet
* 18:07 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
* 20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2217.codfw.wmnet
* 18:02 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet
* 19:47 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2376.codfw.wmnet,service=canary
* 17:58 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
* 19:47 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2374.codfw.wmnet,service=canary
* 17:54 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
* 19:47 ebernhardson: start in-place reindex testwiki in eqiad, codfw, cloudelastic cirrus clusters for [[phab:T269493|T269493]]
* 17:54 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2374.codfw.wmnet
* 17:49 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet
* 19:41 mutante: mw2374, mw2376 - depooling to turn them into canaries
* 17:41 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet
* 19:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2376.codfw.wmnet
* 17:36 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
* 19:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2374.codfw.wmnet
* 17:30 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
* 19:09 cstone: tools revision changed from {{Gerrit|532f8ecb33}} to {{Gerrit|b7b4060c30}}
* 16:41 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS bullseye
* 18:28 bblack: authdns1001.wikimedia.org,dns2001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
* 16:37 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1017.eqiad.wmnet with OS bullseye
* 18:24 bblack: dns[15]001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
* 16:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1021.eqiad.wmnet with OS bullseye
* 18:21 bblack: dns[34]001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS bullseye
* 18:03 mutante: depooling mw2244,mw2245 (API on old hardware), mw2229,mw2230 (app on old hardware) - [[phab:T277119|T277119]]
* 16:15 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Netbox bullseye on netbox-dev2002 (duration: 05m 39s)
* 18:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2245.codfw.wmnet
* 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
* 18:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2244.codfw.wmnet
* 16:09 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Netbox bullseye on netbox-dev2002
* 18:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
* 16:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS bullseye
* 18:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
* 16:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS bullseye
* 17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
* 16:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS bullseye
* 17:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
* 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs1019.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14818 and previous config saved to /var/cache/conftool/dbconfig/20210312-143450-root.json
* 15:26 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs1018.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14817 and previous config saved to /var/cache/conftool/dbconfig/20210312-141947-root.json
* 15:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host aqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14816 and previous config saved to /var/cache/conftool/dbconfig/20210312-140443-root.json
* 15:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host aqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14815 and previous config saved to /var/cache/conftool/dbconfig/20210312-134940-root.json
* 15:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host aqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1088.eqiad.wmnet
* 15:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host aqs1019.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1088.eqiad.wmnet
* 15:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host aqs1018.mgmt.eqiad.wmnet with reboot policy FORCED
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312', diff saved to https://phabricator.wikimedia.org/P14814 and previous config saved to /var/cache/conftool/dbconfig/20210312-131033-marostegui.json
* 15:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host aqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
* 12:12 vgutierrez: restart ats-tls on cp3051
* 15:04 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Netbox bullseye on netbox-dev2002 (duration: 00m 04s)
* 11:55 effie: upgrade memcached on mc1022, mc2022
* 15:04 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Netbox bullseye on netbox-dev2002
* 11:22 hnowlan: corrected git_server for logstash-logback-encoder, cassandra/twcs and cassandra/metrics-collector on deploy1002
* 14:19 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Netbox bullseye on netbox-dev2002 (duration: 11m 29s)
* 09:45 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:13 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum5002.eqsin.wmnet
* 09:45 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:13 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh5002.wikimedia.org
* 09:44 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:12 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum5001.eqsin.wmnet
* 09:43 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:12 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh5001.wikimedia.org
* 09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
* 14:07 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Netbox bullseye on netbox-dev2002
* 09:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
* 14:04 ayounsi@deploy1002: Finished deploy [netbox-dev/deploy@7bbf659]: Netbox bullseye on netbox-dev2002 (duration: 00m 34s)
* 09:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
* 14:04 ayounsi@deploy1002: Started deploy [netbox-dev/deploy@7bbf659]: Netbox bullseye on netbox-dev2002
* 09:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
* 14:03 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum5001.eqsin.wmnet
* 09:07 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling (duration: 01m 35s)
* 14:03 sukhe@cumin2002: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM durum5002.eqsin.wmnet
* 09:05 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling
* 14:03 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh5001.wikimedia.org
* 09:00 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling (duration: 00m 09s)
* 14:02 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh5002.wikimedia.org
* 09:00 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling
* 14:02 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum5002.eqsin.wmnet
* 08:59 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling (duration: 00m 10s)
* 14:02 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum5002.eqsin.wmnet
* 08:59 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: [[phab:T273847|T273847]] export queries to relforge dag deployment - elastic-template handling
* 14:01 sukhe: depool Wikidough and durum in eqsin for [[phab:T307426|T307426]]
* 08:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
* 13:59 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh3001.wikimedia.org
* 08:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
* 13:57 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh3002.wikimedia.org
* 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2002.codfw.wmnet
* 13:56 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum3001.esams.wmnet
* 08:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host pybal-test2002.codfw.wmnet
* 13:56 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum3002.esams.wmnet
* 08:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
* 13:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
* 13:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 08:01 moritzm: installing openjpeg2 security updates
* 13:40 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh3002.wikimedia.org
* 07:16 marostegui: Stop mysql on db2108 to clone db2148
* 13:40 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh3001.wikimedia.org
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 [[phab:T276742|T276742]]', diff saved to https://phabricator.wikimedia.org/P14811 and previous config saved to /var/cache/conftool/dbconfig/20210312-071628-marostegui.json
* 13:40 ayounsi@deploy1002: Finished deploy [netbox-dev/deploy@7bbf659]: Netbox bullseye on netbox-dev2002 (duration: 01m 52s)
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14810 and previous config saved to /var/cache/conftool/dbconfig/20210312-071400-root.json
* 13:40 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum3001.esams.wmnet
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2148 [[phab:T276742|T276742]]', diff saved to https://phabricator.wikimedia.org/P14809 and previous config saved to /var/cache/conftool/dbconfig/20210312-070219-marostegui.json
* 13:39 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum3002.esams.wmnet
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 60%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14808 and previous config saved to /var/cache/conftool/dbconfig/20210312-065857-root.json
* 13:39 sukhe: depool Wikidough and durum in esams for [[phab:T307424|T307424]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314 for table checking [[phab:T276742|T276742]]', diff saved to https://phabricator.wikimedia.org/P14807 and previous config saved to /var/cache/conftool/dbconfig/20210312-065008-marostegui.json
* 13:39 sukhe: depool Wikidough and durum in esams for [[phab:T307425|T307425]]
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 30%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14806 and previous config saved to /var/cache/conftool/dbconfig/20210312-064353-root.json
* 13:38 ayounsi@deploy1002: Started deploy [netbox-dev/deploy@7bbf659]: Netbox bullseye on netbox-dev2002
* 06:30 marostegui: Deploy schema change on s2 codfw master, lag will appear - [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 13:34 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum4001.ulsfo.wmnet
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14805 and previous config saved to /var/cache/conftool/dbconfig/20210312-062850-root.json
* 13:31 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh4002.wikimedia.org
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P14804 and previous config saved to /var/cache/conftool/dbconfig/20210312-061306-marostegui.json
* 13:31 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum4002.ulsfo.wmnet
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1088 from dbctl [[phab:T276025|T276025]]', diff saved to https://phabricator.wikimedia.org/P14803 and previous config saved to /var/cache/conftool/dbconfig/20210312-061118-marostegui.json
* 13:27 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh4001.wikimedia.org
* 04:14 eileen: tools revision changed from {{Gerrit|d64b2f8cee}} to {{Gerrit|532f8ecb33}}
* 13:27 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum4002.ulsfo.wmnet
* 01:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2215.codfw.wmnet
* 13:26 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum4001.ulsfo.wmnet
* 00:58 mutante: shutting down mw2215
* 13:26 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh4002.wikimedia.org
* 00:57 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2215.codfw.wmnet
* 13:24 sukhe@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh4001.wikimedia.org
* 13:21 ayounsi@deploy1002: Finished deploy [netbox-dev/deploy@7bbf659]: Netbox bullseye on netbox-dev2002 (duration: 10m 10s)
* 13:20 sukhe: depool Wikidough and durum in ulsfo for [[phab:T307425|T307425]]
* 13:11 ayounsi@deploy1002: Started deploy [netbox-dev/deploy@7bbf659]: Netbox bullseye on netbox-dev2002
* 12:59 ayounsi@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Netbox bullseye on netbox-dev2002 (duration: 00m 05s)
* 12:59 ayounsi@deploy1002: Started deploy [netbox/deploy@87a36a7]: Netbox bullseye on netbox-dev2002
* 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase-dev1004.eqiad.wmnet
* 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host restbase-dev1004.eqiad.wmnet
* 11:38 hnowlan: enabling postgres slow query log on maps replicas [[phab:T307671|T307671]]
* 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow3002.esams.wmnet
* 11:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow3002.esams.wmnet
* 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install3001.wikimedia.org
* 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install3001.wikimedia.org
* 11:12 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2058.codfw.wmnet with OS bullseye
* 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir5002.eqsin.wmnet
* 11:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir5002.eqsin.wmnet
* 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir5001.eqsin.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir5001.eqsin.wmnet
* 10:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2058.codfw.wmnet with reason: host reimage
* 10:49 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2058.codfw.wmnet with reason: host reimage
* 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow5002.eqsin.wmnet
* 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow5002.eqsin.wmnet
* 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install5001.wikimedia.org
* 10:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install5001.wikimedia.org
* 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast5002.wikimedia.org
* 10:32 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2058.codfw.wmnet with OS bullseye
* 10:30 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2058.codfw.wmnet with OS bullseye
* 10:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast5002.wikimedia.org
* 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir4002.ulsfo.wmnet
* 10:12 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir4002.ulsfo.wmnet
* 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir4001.ulsfo.wmnet
* 10:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1008.eqiad.wmnet
* 10:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir4001.ulsfo.wmnet
* 09:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1008.eqiad.wmnet
* 09:56 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1007.eqiad.wmnet
* 09:56 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2058.codfw.wmnet with OS bullseye
* 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow4002.ulsfo.wmnet
* 09:49 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1007.eqiad.wmnet
* 09:45 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow4002.ulsfo.wmnet
* 09:40 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
* 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install4001.wikimedia.org
* 09:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install4001.wikimedia.org
* 09:33 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
* 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4003.wikimedia.org
* 09:31 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1005.eqiad.wmnet
* 09:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4003.wikimedia.org
* 09:27 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2057.codfw.wmnet with OS bullseye
* 09:25 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1005.eqiad.wmnet
* 09:23 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 09:17 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 09:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 09:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2057.codfw.wmnet with reason: host reimage
* 09:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 09:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 09:00 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2057.codfw.wmnet with reason: host reimage
* 08:54 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 08:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 08:45 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 08:16 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2057.codfw.wmnet with OS bullseye
* 07:49 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2057.codfw.wmnet with OS bullseye
* 07:42 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2057.codfw.wmnet with OS bullseye
* 07:41 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2057.codfw.wmnet with OS bullseye
* 07:31 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2057.codfw.wmnet with OS bullseye
* 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
* 07:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
* 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
* 07:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
* 07:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
* 07:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
* 01:51 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1415.eqiad.wmnet
* 01:50 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw1415.eqiad.wmnet
* 00:46 rook@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudvirt1016.eqiad.wmnet
* 00:46 rook@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1016.eqiad.wmnet


== 2021-03-11 ==
== 2022-05-05 ==
* 22:55 mutante: depooled mw2224 through mw2228 but not removing from DSH groups yet ([[phab:T277119|T277119]])
* 22:06 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2228.codfw.wmnet
* 22:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2227.codfw.wmnet
* 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2226.codfw.wmnet
* 22:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2225.codfw.wmnet
* 21:58 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734722{{!}}Add missing termbox codes from Wikibase (T277836)]] (duration: 00m 48s)
* 22:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2224.codfw.wmnet
* 21:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:35 brennen@deploy1002: Synchronized php-1.39.0-wmf.10/includes/user: Backport: [[gerrit:789332{{!}}Suppress "named" group when TempUser system is disabled (T307675)]] (duration: 00m 48s)
* 22:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 21:33 brennen@deploy1002: scap failed: average error rate on 7/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
* 22:47 mutante: running DNS cookbook in
* 21:26 brennen@deploy1002: Finished scap: Resuming previously interrupted sync-world (duration: 03m 47s)
* 21:25 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
* 21:24 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
* 21:22 brennen@deploy1002: Started scap: Resuming previously interrupted sync-world
* 21:21 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: new kernel
* 21:21 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: new kernel
* 21:21 jhathaway: reboot mx1001
* 21:18 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
* 21:18 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/


== 2021-03-10 ==
== 2022-05-04 ==
* 23:49 mholloway-shell@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/EventLogging: EventLogging: Stream always in sample if the user is in debugMode ([[phab:T276515|T276515]]) (duration: 01m 23s)
* 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P27565 and previous config saved to /var/cache/conftool/dbconfig/20220504-235020-ladsgroup.json
* 23:41 dwisehaupt: disabled silverpop daily run in process-control until utf8mb4 conversion completes on frdev1001
* 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27564 and previous config saved to /var/cache/conftool/dbconfig/20220504-235000-ladsgroup.json
* 23:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: REIMAGE
* 23:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:10 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: REIMAGE
* 23:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 23:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry1002.eqiad.wmnet
* 23:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 23:01 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry1002.eqiad.wmnet
* 23:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 22:55 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for
* 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27563 and previous config saved to /var/cache/conftool/dbconfig/20220504-234947-ladsgroup.json
* 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T307525|T307525]])', diff saved to https://phabricator.wikimedia.org/P27562 and previous config saved to /var/cache/conftool/dbconfig/20220504-233611-ladsgroup.json
* 23:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 23:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 ([[phab:T307525|T307525]]


== 2021-03-09 ==
== 2022-05-03 ==
* 23:59 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: [[phab:T307525|T307525]]', diff saved to https://phabricator.wikimedia.org/P27363 and previous config saved to /var/cache/conftool/dbconfig/20220503-235701-ladsgroup.json
* 23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE
* 23:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:04 mutante: phab1001 - manually running phab public task dumd script after making changes to redirect stdout
* 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:42 elukey: reimaged an-worker1091 to buster
* 23:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:41 bstorm: depooled labsdb1009 [[phab:T276980|T276980]]
* 23:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 20:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE
* 23:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 20:25 bstorm: downtimed labsdb1009 so it doesn't keep paging [[phab:T276980|T276980]]
* 23:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre
* 20:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.


== 2021-03-08 ==
== 2022-05-02 ==
* 22:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1005.eqiad.wmnet with reason: REIMAGE
* 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye
* 22:34 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1005.eqiad.wmnet with reason: REIMAGE
* 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS bullseye
* 21:42 mholloway-shell@deploy1002: Synchronized wmf-config/
* 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
* 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
* 22:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage
* 22:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage
* 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host krb2002.codfw.wmnet with OS bullseye
* 22:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS bullseye
* 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:49 catrope@deploy1002: Finished scap: Backport: [[gerrit:788338{{!}}[TOC] Remove pointer-events:none on .sidebar-toc-link (T307271)]] and [[gerrit:788336{{!}}Video landing page: Show different title/body text on mobile (T303785)]] (duration: 11m 45s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:46 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 20:44 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation:


== 2021-03-07 ==
== 2022-05-01 ==
* 08:01 elukey: "megacli -LDSetProp -ForcedWB -Immediate -Lall -aAll" on analytics1066 - BBU looks fine, but the raid controller was using WriteThrough
* 23:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T298565|T298565]])', diff saved to https://phabricator.wikimedia.org/P27195 and previous config saved to /var/cache/conftool
 
== 2021-03-05 ==
* 23:16 legoktm: imported pygments 2.8.0+dfsg-1 to apt.wm.o buster-wikimedia component/pygments ([[phab:T276298|T276298]])
* 21:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:32 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 21:01 legoktm: updated udplog to 1.9 on mwlog1002.eqiad.wmnet and mwlog2002.codfw.wmnet
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deploy1001.eqiad.wmnet
* 20:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts deploy1001.eqiad.wmnet
* 20:15 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2002.codfw.wmnet
* 20:15 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2001.codfw.wmnet
* 20:12 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry2004.codfw.wmnet
* 20:04 legoktm@deploy1002: conftool action : set/weight=10; selector: name=registry2004.codfw.wmnet
* 20:04 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2004.codfw.wmnet
* 20:02 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2004.codfw.wmnet
* 19:30 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2004.codfw.wmnet
* 19:14 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2004.codfw.wmnet
* 19:04 mutante: phab1001 - running public_task_dump.py (from cron job) manually
* 18:50 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry2004.eqiad.wmnet
* 18:45 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry2004.eqiad.wmnet
* 18:45 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1021.eqiad.wmnet with reason: REIMAGE
* 18:43 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1021.eqiad.wmnet with reason: REIMAGE
* 18:23 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:18 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 16:58 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:54 effie: depool mw1276 and pool back
* 16:53 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 16:48 razzi: edit https://netbox.wikimedia.org/dcim/devices/2078/ device name from labsdb1012 to clouddb1021
* 16:36 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1036.eqiad.wmnet
* 16:30 razzi: delete non-mgmt interfaces for labsdb1012 at https://netbox.wikimedia.org/dcim/devices/2078/interfaces/
* 16:28 razzi: rename https://netbox.wikimedia.org/ipam/ip-addresses/734/ DNS name from labsdb1012.mgmt.eqiad.wmnet to clouddb1021.mgmt.eqiad.wmnet
* 16:22 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1036.eqiad.wmnet
* 16:17 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1012.eqiad.wmnet
* 16:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1086.eqiad.wmnet with reason: REIMAGE
* 16:09 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1086.eqiad.wmnet with reason: REIMAGE
* 16:07 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1012.eqiad.wmnet
* 15:56 razzi: stop mariadb on labsdb1012 to reimage and rename to clouddb1021: [[phab:T269211|T269211]]
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1073.eqiad.wmnet with reason: REIMAGE
* 15:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1073.eqiad.wmnet with reason: REIMAGE
* 15:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:07 elukey: drain + reimage analytics1073 and an-worker1086 to Debian Buster
* 14:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:20 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 13:52 marostegui: Rebuild some indexes on db2102
* 13:38 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'DEpool db1134', diff saved to https://phabricator.wikimedia.org/P14644 and previous config saved to /var/cache/conftool/dbconfig/20210305-133833-marostegui.json
* 13:24 marostegui: Check tables on db1134
* 12:31 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1035.eqiad.wmnet
* 12:24 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1035.eqiad.wmnet
* 11:28 marostegui: Temporarily set  innodb_change_buffering = none on db1134 (s1) - [[phab:T263443|T263443]]
* 11:09 marostegui: Run check table on db2092, db2116, db2145, db2146 (there will be lag)
* 10:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1034.eqiad.wmnet
* 10:47 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1034.eqiad.wmnet
* 10:43 jakob@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:38 jakob@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1033.eqiad.wmnet
* 10:25 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1033.eqiad.wmnet
* 09:54 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 09:52 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:50 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:45 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:31 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 09:31 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 09:31 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1078.eqiad.wmnet with reason: REIMAGE
* 09:28 jayme: switched back active kubernetes staging cluster to eqiad
* 09:28 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 09:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1078.eqiad.wmnet with reason: REIMAGE
* 09:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1079.eqiad.wmnet with reason: REIMAGE
* 09:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1079.eqiad.wmnet with reason: REIMAGE
* 09:21 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ms-be1034.eqiad.wmnet
* 09:19 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 09:12 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be1034.eqiad.wmnet
* 08:44 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2003.codfw.wmnet with reason: REIMAGE
* 08:42 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2003.codfw.wmnet with reason: REIMAGE
* 08:32 elukey: drain + reimage an-worker107[8,9] to Debian Buster
* 08:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1071.eqiad.wmnet with reason: REIMAGE
* 07:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1070.eqiad.wmnet with reason: REIMAGE
* 07:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1071.eqiad.wmnet with reason: REIMAGE
* 07:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1070.eqiad.wmnet with reason: REIMAGE
* 07:33 elukey: drain + reimage analytics107[0-1] to debian buster
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P14640 and previous config saved to /var/cache/conftool/dbconfig/20210305-065137-marostegui.json
* 06:17 legoktm: uploaded udplog 1.9 (buster-wikimedia) to apt.wikimedia.org ([[phab:T276421|T276421]])
* 00:59 legoktm: depooled registry1001/registry1002 (old stretch VMs) - [[phab:T272550|T272550]]
* 00:59 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry1002.eqiad.wmnet
* 00:58 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry1001.eqiad.wmnet
* 00:58 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry1004.eqiad.wmnet
* 00:57 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry1004.eqiad.wmnet
* 00:57 legoktm@deploy1002: conftool action : set/pooled=inactive; selector: name=registry1004.eqiad.codfw
* 00:56 legoktm@deploy1002: conftool action : set/weight=10; selector: name=registry1004.eqiad.wmnet
* 00:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry1004.eqiad.codfw
* 00:50 ryankemper: [[phab:T266470|T266470]] [ats] `sudo cumin 'A:cp-ats' 'sudo run-puppet-agent'`
* 00:47 ryankemper: [[phab:T266470|T266470]] [ats] Deploying new mappings for `query-preview.wikidata.org` microsite: https://gerrit.wikimedia.org/r/c/operations/puppet/+/668173/
* 00:41 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@4cc913e]: correct refinery-drop-older-than checksum (duration: 01m 34s)
* 00:39 ryankemper: [[phab:T266470|T266470]] Ran `sudo run-puppet-agent` on `miscweb1002` without issue; `/var/log/apache2/query*.log` looks as expected
* 00:39 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@4cc913e]: correct refinery-drop-older-than checksum
* 00:36 ryankemper: [[phab:T266470|T266470]] Deploying new `query-preview` microsite: https://gerrit.wikimedia.org/r/c/operations/puppet/+/668543
* 00:23 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2004.eqiad.wmnet
* 00:06 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2004.eqiad.wmnet
 
== 2021-03-04 ==
* 23:55 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry1004.eqiad.wmnet
* 23:39 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry1004.eqiad.wmnet
* 20:12 urbanecm@deploy1002: Synchronized wmf-config/config/hiwiki.yaml: {{Gerrit|c6b04cb1bc0b56823f96c59c93bd88f331f7d261}}: Enable Growth features on hiwiki in stealth mode ([[phab:T276450|T276450]]; 3/3) (duration: 00m 58s)
* 20:11 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|c6b04cb1bc0b56823f96c59c93bd88f331f7d261}}: Enable Growth features on hiwiki in stealth mode ([[phab:T276450|T276450]]; 2/3) (duration: 00m 57s)
* 20:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c6b04cb1bc0b56823f96c59c93bd88f331f7d261}}: Enable Growth features on hiwiki in stealth mode ([[phab:T276450|T276450]]; 1/3) (duration: 00m 57s)
* 20:08 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/GrowthExperiments/includes/HomepageModules/Help.php: {{Gerrit|8cc65e3fd0b4a75599171b619108584526784853}}: cleanup: Remove help panel URL from Help homepage module ([[phab:T276450|T276450]]; [[phab:T273118|T273118]]) (duration: 00m 58s)
* 19:33 rzl: restarted apache and php7.0-fpm on doc1001 due to staleness
* 19:21 urbanecm@deploy1002: Synchronized wmf-config/config/sqwiki.yaml: {{Gerrit|377bc4fcfd8719281776661eae2297ac1242dae6}}: Enable Growth features on sqwiki in stealth mode ([[phab:T275550|T275550]]; 3/3) (duration: 00m 57s)
* 19:20 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|377bc4fcfd8719281776661eae2297ac1242dae6}}: Enable Growth features on sqwiki in stealth mode ([[phab:T275550|T275550]]; 2/3) (duration: 00m 57s)
* 19:19 dwisehaupt: replication restarted on frdb1004 after utf8mb4 conversion completed.
* 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|377bc4fcfd8719281776661eae2297ac1242dae6}}: Enable Growth features on sqwiki in stealth mode ([[phab:T275550|T275550]]; 1/3) (duration: 00m 57s)
* 19:11 jforrester@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/FlaggedRevs/frontend/specialpages/reports/ProblemChanges.php: [[phab:T276386|T276386]] Fix fatal calls to getConfig (duration: 01m 12s)
* 19:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:59 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:26 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2003.codfw.wmnet with reason: REIMAGE
* 18:25 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2003.codfw.wmnet with reason: REIMAGE
* 17:39 mutante: [deneb:~] $ sudo systemctl start cowbuilder_update_jessie-amd64
* 17:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:20 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on deploy1001.eqiad.wmnet with reason: decom
* 17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on deploy1001.eqiad.wmnet with reason: decom
* 17:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1032.eqiad.wmnet
* 16:59 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1032.eqiad.wmnet
* 16:56 tarrow@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 16:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1069.eqiad.wmnet with reason: REIMAGE
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1068.eqiad.wmnet with reason: REIMAGE
* 16:54 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1069.eqiad.wmnet with reason: REIMAGE
* 16:53 tarrow@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 16:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1068.eqiad.wmnet with reason: REIMAGE
* 16:47 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1031.eqiad.wmnet
* 16:33 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1031.eqiad.wmnet
* 16:23 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 16:20 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 16:13 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1026.eqiad.wmnet
* 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2145', diff saved to https://phabricator.wikimedia.org/P14635 and previous config saved to /var/cache/conftool/dbconfig/20210304-161226-marostegui.json
* 16:08 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1026.eqiad.wmnet
* 16:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1025.eqiad.wmnet
* 15:55 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1025.eqiad.wmnet
* 15:52 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 15:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1024.eqiad.wmnet
* 15:28 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 15:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1067.eqiad.wmnet with reason: REIMAGE
* 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1066.eqiad.wmnet with reason: REIMAGE
* 15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1067.eqiad.wmnet with reason: REIMAGE
* 15:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1066.eqiad.wmnet with reason: REIMAGE
* 15:21 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 15:12 elukey: drain + reimage analytics106[6,7] to Debian Buster
* 15:11 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1024.eqiad.wmnet
* 14:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1065.eqiad.wmnet with reason: REIMAGE
* 14:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1065.eqiad.wmnet with reason: REIMAGE
* 14:38 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:35 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:34 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:30 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 14:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts neon.eqiad.wmnet
* 14:18 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts neon.eqiad.wmnet
* 14:15 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts neon.eqiad.wmnet
* 14:15 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts neon.eqiad.wmnet
* 14:04 liw@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.33
* 13:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1064.eqiad.wmnet with reason: REIMAGE
* 13:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1063.eqiad.wmnet with reason: REIMAGE
* 13:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1064.eqiad.wmnet with reason: REIMAGE
* 13:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1063.eqiad.wmnet with reason: REIMAGE
* 13:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P14632 and previous config saved to /var/cache/conftool/dbconfig/20210304-134521-marostegui.json
* 13:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 13:44 volans: uploaded spicerack_0.0.49 to apt.wikimedia.org buster-wikimedia
* 13:35 moritzm: restarting mw canaries for libzstd update
* 13:32 elukey: drain + reimage analytics10[63,64] to Debian Buster
* 13:29 moritzm: installing libzstd security updates on Buster
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2146 to dbctl [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14631 and previous config saved to /var/cache/conftool/dbconfig/20210304-131301-marostegui.json
* 13:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1062.eqiad.wmnet with reason: REIMAGE
* 13:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1061.eqiad.wmnet with reason: REIMAGE
* 13:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1062.eqiad.wmnet with reason: REIMAGE
* 13:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1061.eqiad.wmnet with reason: REIMAGE
* 12:48 elukey: drain + reimage analytics10[61,62] to Debian Buster
* 12:45 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 12:40 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6fcbb9f]: (no justification provided) (duration: 00m 14s)
* 12:40 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:668108{{!}}Remove conflicting gadget configuration for hewiki (T276330)]] (duration: 01m 12s)
* 12:40 mbsantos@deploy1002: Started deploy [tilerator/deploy@6fcbb9f]: (no justification provided)
* 12:34 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 12:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db1115.eqiad.wmnet,dbmonitor1001.wikimedia.org with reason: Restart db1115 to fix memory leak
* 12:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db1115.eqiad.wmnet,dbmonitor1001.wikimedia.org with reason: Restart db1115 to fix memory leak
* 12:10 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 12:00 marostegui: Stop mysql on db1117:3321 to clone db1159
* 11:42 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2145 to s1 (and repool db2116) - [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14625 and previous config saved to /var/cache/conftool/dbconfig/20210304-114052-marostegui.json
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2145 into dbctl depooled - [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14624 and previous config saved to /var/cache/conftool/dbconfig/20210304-112848-marostegui.json
* 11:27 _joe_: restarted redis on mc2027 to pick up the replication change
* 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE
* 11:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE
* 11:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Needs fixing after [[phab:T274472|T274472]]
* 11:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Needs fixing after [[phab:T274472|T274472]]
* 11:08 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1022.eqiad.wmnet
* 11:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE
* 11:02 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1022.eqiad.wmnet
* 11:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE
* 10:40 elukey: drain + reimage analytics1059/1060 to Debian Buster
* 10:32 moritzm: uploaded screen 4.2.1-3+deb8u1+wmf1 to jessie-wikimedia
* 09:32 elukey: install linux 5.10 on an-worker[1097-1101] (GPU workers) and reboot them
* 09:30 kormat: disabling puppet on all db hosts while deploying  a puppet monitoring change [[phab:T275497|T275497]]
* 09:19 moritzm: uploaded udplog 1.8.5+deb10u1 to buster-wikimedia
* 08:45 elukey@deploy1002: Finished deploy [analytics/refinery@605f8b8]: Fix for geoeditors monthly job (duration: 11m 03s)
* 08:33 elukey@deploy1002: Started deploy [analytics/refinery@605f8b8]: Fix for geoeditors monthly job
* 07:38 elukey: reboot an-worker1096 to pick up 5.10 kernel
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 [[phab:T276025|T276025]]', diff saved to https://phabricator.wikimedia.org/P14622 and previous config saved to /var/cache/conftool/dbconfig/20210304-062503-marostegui.json
* 06:11 marostegui: Stop MySQL on db2116 to clone db2145 [[phab:T275633|T275633]]
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P14621 and previous config saved to /var/cache/conftool/dbconfig/20210304-061134-marostegui.json
* 05:20 kart_: Updated apertium to 2021-03-03-170806-production ([[phab:T274262|T274262]])
* 05:15 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 05:11 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 05:10 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 01:24 twentyafterfour: phabricator upgrade complete
* 01:22 twentyafterfour: restarting php7.3-fpm on phab1001 to complete phabricator upgrade
* 00:02 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e47f735]: search_satisfaction_daily: make files readable by druid ingestion (duration: 25m 35s)
 
== 2021-03-03 ==
* 23:36 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e47f735]: search_satisfaction_daily: make files readable by druid ingestion
* 23:08 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry2003.codfw.wmnet
* 22:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwmaint2001.codfw.wmnet
* 22:51 legoktm@deploy1002: conftool action : set/weight=10; selector: name=registry2003.codfw.wmnet
* 22:50 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet
* 22:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwmaint2001.codfw.wmnet
* 22:05 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2003.codfw.wmnet
* 21:58 mutante: puppetmaster1001 - signing puppet cert for gitlab1001.wikmedia.org ([[phab:T274459|T274459]])
* 21:53 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@7f37d40]: replace refinery-drop-hive-partitions with refinery-drop-older-than (duration: 01m 37s)
* 21:51 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@7f37d40]: replace refinery-drop-hive-partitions with refinery-drop-older-than
* 21:50 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2003.codfw.wmnet
* 21:30 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry1003.eqiad.wmnet
* 21:25 legoktm@deploy1002: conftool action : set/weight=10; selector: name=registry1003.eqiad.wmnet
* 21:21 legoktm@deploy1002:


==Archives==
==Archives==

Revision as of 01:06, 21 May 2022

2022-05-21

  • 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298555)', diff saved to https://phabricator.wikimedia.org/P28208 and previous config saved to /var/cache/conftool/dbconfig/20220521-010640-ladsgroup.json
  • 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298555)', diff saved to https://phabricator.wikimedia.org/P28207 and previous config saved to /var/cache/conftool/dbconfig/20220521-010626-ladsgroup.json
  • 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298555)', diff saved to https://phabricator.wikimedia.org/P28206 and previous config saved to /var/cache/conftool/dbconfig/20220521-001014-ladsgroup.json
  • 00:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 00:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1174.eqiad.wmnet with reason: Maintenance

2022-05-20

  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28205 and previous config saved to /var/cache/conftool/dbconfig/20220520-224558-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28204 and previous config saved to /var/cache/conftool/dbconfig/20220520-223054-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28203 and previous config saved to /var/cache/conftool/dbconfig/20220520-221550-ladsgroup.json
  • 22:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye
  • 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P28202 and previous config saved to /var/cache/conftool/dbconfig/20220520-220046-ladsgroup.json
  • 21:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 21:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298555)', diff saved to https://phabricator.wikimedia.org/P28201 and previous config saved to /var/cache/conftool/dbconfig/20220520-215514-ladsgroup.json
  • 21:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 21:50 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 21:38 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye
  • 21:37 mutante: correction: mistake was to use FQDN T307142
  • 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError T307142
  • 21:36 mutante: attempt to use reimage cookbook failed: spicerack.netbox.NetboxHostNotFoundError
  • 21:34 mutante: reimaging gitlab1004 (insetup) to test partman recipe from gerrit:793534 - T307142
  • 21:34 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
  • 21:33 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1004.wikimedia.org with reason: reimage
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298555)', diff saved to https://phabricator.wikimedia.org/P28198 and previous config saved to /var/cache/conftool/dbconfig/20220520-190633-ladsgroup.json
  • 19:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 19:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 18:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:55 mutante: [mwmaint1002:~] $ sudo mwscript initSiteStats.php --wiki=kcgwiki --update (to update statistics for latest wikipedia kcg) T305281
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 17:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5003.eqsin.wmnet with OS bullseye
  • 17:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 17:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5003.eqsin.wmnet with reason: host reimage
  • 16:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:37 robh@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti5003.eqsin.wmnet with OS bullseye
  • 16:33 robh: troubleshooting ganeti5003 ipmi failure via T308211
  • 16:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:09 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 16:08 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
  • 16:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2069.codfw.wmnet with OS bullseye
  • 15:58 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: sync
  • 15:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2068.codfw.wmnet with OS bullseye
  • 15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
  • 15:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
  • 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2069.codfw.wmnet with OS bullseye
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye
  • 15:17 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2068.codfw.wmnet with OS bullseye
  • 15:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1118 T', diff saved to https://phabricator.wikimedia.org/P28196 and previous config saved to /var/cache/conftool/dbconfig/20220520-151407-ladsgroup.json
  • 15:11 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28195 and previous config saved to /var/cache/conftool/dbconfig/20220520-150838-root.json
  • 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28194 and previous config saved to /var/cache/conftool/dbconfig/20220520-145334-root.json
  • 14:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2066.codfw.wmnet with OS bullseye
  • 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 10 hosts with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on 10 hosts with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298555)', diff saved to https://phabricator.wikimedia.org/P28193 and previous config saved to /var/cache/conftool/dbconfig/20220520-144212-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T298565)', diff saved to https://phabricator.wikimedia.org/P28192 and previous config saved to /var/cache/conftool/dbconfig/20220520-144111-ladsgroup.json
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28191 and previous config saved to /var/cache/conftool/dbconfig/20220520-143830-root.json
  • 14:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 14:28 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28190 and previous config saved to /var/cache/conftool/dbconfig/20220520-142327-root.json
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T303603)', diff saved to https://phabricator.wikimedia.org/P28189 and previous config saved to /var/cache/conftool/dbconfig/20220520-142032-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T303603)', diff saved to https://phabricator.wikimedia.org/P28188 and previous config saved to /var/cache/conftool/dbconfig/20220520-141316-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T303603)', diff saved to https://phabricator.wikimedia.org/P28187 and previous config saved to /var/cache/conftool/dbconfig/20220520-141308-ladsgroup.json
  • 14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2066.codfw.wmnet with OS bullseye
  • 14:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28186 and previous config saved to /var/cache/conftool/dbconfig/20220520-140823-root.json
  • 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T303603)', diff saved to https://phabricator.wikimedia.org/P28185 and previous config saved to /var/cache/conftool/dbconfig/20220520-135350-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28184 and previous config saved to /var/cache/conftool/dbconfig/20220520-135319-root.json
  • 13:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T298565)', diff saved to https://phabricator.wikimedia.org/P28183 and previous config saved to /var/cache/conftool/dbconfig/20220520-134515-ladsgroup.json
  • 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 13:44 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
  • 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 1%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P28182 and previous config saved to /var/cache/conftool/dbconfig/20220520-133815-root.json
  • 13:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: T308459
  • 13:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp2038.codfw.wmnet with reason: downtimed because of DIMM replacement: T308459
  • 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-tls
  • 13:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=varnish-fe
  • 13:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet,service=ats-be
  • 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T303603)', diff saved to https://phabricator.wikimedia.org/P28181 and previous config saved to /var/cache/conftool/dbconfig/20220520-132307-ladsgroup.json
  • 13:15 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye
  • 12:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye
  • 12:42 mforns@deploy1002: Finished deploy [airflow-dags/analytics@51a203f]: (no justification provided) (duration: 00m 07s)
  • 12:42 mforns@deploy1002: Started deploy [airflow-dags/analytics@51a203f]: (no justification provided)
  • 12:37 moritzm: copy prometheus-mcrouter-exporter from buster-wikimedia to bullseye-wikimedia (needed for T308214)
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T303603)', diff saved to https://phabricator.wikimedia.org/P28180 and previous config saved to /var/cache/conftool/dbconfig/20220520-123045-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T303603)', diff saved to https://phabricator.wikimedia.org/P28179 and previous config saved to /var/cache/conftool/dbconfig/20220520-123037-ladsgroup.json
  • 12:23 Amir1: killed refreshlinks suggestion in 10160
  • 12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298555)', diff saved to https://phabricator.wikimedia.org/P28178 and previous config saved to /var/cache/conftool/dbconfig/20220520-121116-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:10 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
  • 11:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298555)', diff saved to https://phabricator.wikimedia.org/P28177 and previous config saved to /var/cache/conftool/dbconfig/20220520-114234-ladsgroup.json
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T303603)', diff saved to <