You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(tzatziki: reset password for Philroc)
imported>Stashbot
(mutante: restbase-dev1006 has manually installed packages (wrk, maybe others))
(411 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2021-03-26 ==
== 2022-06-23 ==
* 22:27 tzatziki: reset password for Philroc
* 21:23 mutante: restbase-dev1006 has manually installed packages (wrk, maybe others)
* 20:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 21:22 brennen: end of utc late backport & config window
* 17:44 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/includes/changes/RecentChange.php: RecentChange: directly build the user identity if we have the data - [[phab:T277795|T277795]] (duration: 01m 06s)
* 21:21 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808055{{!}}[cleanup] Drop non-existent feature flags]] (duration: 03m 33s)
* 17:42 hashar@deploy1002: Finished scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]] (duration: 31m 43s)
* 21:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:10 hashar@deploy1002: Started scap: Revert "Add change tags for media additions/removals" - [[phab:T266067|T266067]] [[phab:T278429|T278429]]
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:40 Urbanecm: Delete `commonswiki:ip-autoblock:whitelist` cache key from memcached (wmf.36 moves the autoblock whitelist source, and it was deployed on commonswiki for a while, resulting in the cache key being empty)
* 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:37 hnowlan: importing imposm3_0.11.0+git20201104.4758cf4-1_amd64.changes on apt1001
* 21:13 thcipriani@deploy1002: Finished scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]] (duration: 17m 29s)
* 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
* 21:01 inflatador: looking in to wdqs1006 alert ^^
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
* 20:56 thcipriani@deploy1002: Started scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]]
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:02 moritzm: reimaging theemin [[phab:T275873|T275873]]
* 20:49 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808064{{!}}Enable DiscussionTools topicsubscription, autotopicsub on testwiki (T310808)]] (duration: 03m 18s)
* 12:56 moritzm: drain ganeti1014
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dse-k8s-ctrl1001.eqiad.wmnet
* 12:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
* 20:48 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 12:37 moritzm: drain ganeti1013
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:55 Urbanecm: Move `Help talk:Getting Started --> Help talk:Getting started` on enwiki with `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` ([[phab:T278350|T278350]])
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:49 Urbanecm: Move `User talk:TheAafi/Help talk` to `Help talk:Getting Started` via `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing [[:phab:T278350]]' -u 'Martin Urbanec' batch.txt` to fix an UBN task ([[phab:T278350|T278350]])
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:10 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts chlorine.eqiad.wmnet
* 20:43 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806847{{!}}ukwikibooks: Add NS102 (Рецепт) to wgContentNamespaces (T310940)]] (duration: 03m 41s)
* 10:02 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts chlorine.eqiad.wmnet
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts argon.eqiad.wmnet
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 09:49 filippo@deploy1002: Finished deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}} (duration: 00m 08s)
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 09:49 filippo@deploy1002: Started deploy [librenms/librenms@63e862a]: deploy {{Gerrit|I955cbfc244}}
* 20:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:46 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts argon.eqiad.wmnet
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts acrab.codfw.wmnet
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:43 moritzm: delete fermium in Ganeti (was still around, but powered down) [[phab:T224586|T224586]]
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:38 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts acrux.codfw.wmnet
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:36 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrab.codfw.wmnet
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:32 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrux.codfw.wmnet
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:31 filippo@deploy1002: Finished deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}} (duration: 00m 12s)
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:31 filippo@deploy1002: Started deploy [librenms/librenms@e7727e3]: deploy {{Gerrit|I12ac21d877c}}
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:28 moritzm: drain ganeti1012
* 20:30 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
* 20:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1001.eqiad.wmnet
* 09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:38 moritzm: drain ganeti1010
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:11 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 20:15 mutante: cumin -b 15 -p 95 'mw1*' 'run-puppet-agent -q --failed-only'
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 05:06 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@bb5a072]: 0.3.68 (duration: 07m 31s)
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 05:00 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.68` on canary `wdqs1003`; proceeding to rest of fleet
* 20:11 mutante: cumin -b 15 -p 95 'mw2*' 'run-puppet-agent -q --failed-only'
* 04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@bb5a072]: 0.3.68
* 20:09 mutante: cumin -b 15 -p 95 'parse*' 'run-puppet-agent -q --failed-only'
* 04:58 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.68`. Pre-deploy tests passing on canary `wdqs1003`
* 20:07 mutante: cumin -b 15 -p 95 'wtp*' 'run-puppet-agent -q --failed-only'
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:39 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:34 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:24 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:21 ejegg: fundraising python tools updated from {{Gerrit|40d376d4}} to {{Gerrit|acf89fb2}}
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:38 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:20 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 18:01 brennen: train 1.39.0-wmf.17 ([[phab:T308070|T308070]]): no current blockers - rolling to all wikis
* 18:01 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:53 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:32 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:00 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:17 hashar: Upgrading CI Jenkins # [[phab:T311174|T311174]]
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/Hooks.php: Backport: [[gerrit:807902{{!}}Do not re-use "wikibase_config" for registering the language selector... (T307869)]] (duration: 03m 22s)
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30042 and previous config saved to /var/cache/conftool/dbconfig/20220623-150954-root.json
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30041 and previous config saved to /var/cache/conftool/dbconfig/20220623-150951-root.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30040 and previous config saved to /var/cache/conftool/dbconfig/20220623-150422-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30039 and previous config saved to /var/cache/conftool/dbconfig/20220623-145450-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30038 and previous config saved to /var/cache/conftool/dbconfig/20220623-145448-root.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30037 and previous config saved to /var/cache/conftool/dbconfig/20220623-144918-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30036 and previous config saved to /var/cache/conftool/dbconfig/20220623-143946-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30035 and previous config saved to /var/cache/conftool/dbconfig/20220623-143944-root.json
* 14:34 papaul: on going PDU maintenance in rack A3 codfw
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30034 and previous config saved to /var/cache/conftool/dbconfig/20220623-143414-root.json
* 14:31 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:30 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30033 and previous config saved to /var/cache/conftool/dbconfig/20220623-142443-root.json
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30032 and previous config saved to /var/cache/conftool/dbconfig/20220623-142440-root.json
* 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30031 and previous config saved to /var/cache/conftool/dbconfig/20220623-141910-root.json
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:10 taavi@deploy1002: Synchronized php-1.39.0-wmf.17/includes/skins/Skin.php: Backport: [[gerrit:807900{{!}}Skin: Change viewport based on feedback (T311119)]] (duration: 03m 29s)
* 14:10 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30030 and previous config saved to /var/cache/conftool/dbconfig/20220623-140939-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30029 and previous config saved to /var/cache/conftool/dbconfig/20220623-140936-root.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30028 and previous config saved to /var/cache/conftool/dbconfig/20220623-140406-root.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:02 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:00 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:00 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 13:58 moritzm: import jenkins 2.346.1 to thirdparty/ci [[phab:T311174|T311174]]
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30027 and previous config saved to /var/cache/conftool/dbconfig/20220623-135435-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30026 and previous config saved to /var/cache/conftool/dbconfig/20220623-135432-root.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30025 and previous config saved to /var/cache/conftool/dbconfig/20220623-134902-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30024 and previous config saved to /var/cache/conftool/dbconfig/20220623-133931-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30023 and previous config saved to /var/cache/conftool/dbconfig/20220623-133928-root.json
* 13:38 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (2/2) (duration: 03m 26s)
* 13:34 taavi@deploy1002: Synchronized static/images/mobile/copyright/: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (1/2) (duration: 03m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30022 and previous config saved to /var/cache/conftool/dbconfig/20220623-133358-root.json
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182 db1184 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30021 and previous config saved to /var/cache/conftool/dbconfig/20220623-132951-root.json
* 13:27 sukhe: disable puppet on A:durum or A:wikidough or A:centrallog or A:dns-rec: deploying [[phab:T310574|T310574]]
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30020 and previous config saved to /var/cache/conftool/dbconfig/20220623-132729-root.json
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30019 and previous config saved to /var/cache/conftool/dbconfig/20220623-132133-root.json
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30018 and previous config saved to /var/cache/conftool/dbconfig/20220623-132128-root.json
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807050{{!}}[ImageSuggestions] Enable extension on ptwiki, ruwiki & idwiki (T302711)]] (duration: 03m 44s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30017 and previous config saved to /var/cache/conftool/dbconfig/20220623-130629-root.json
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30016 and previous config saved to /var/cache/conftool/dbconfig/20220623-130624-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30015 and previous config saved to /var/cache/conftool/dbconfig/20220623-125553-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30014 and previous config saved to /var/cache/conftool/dbconfig/20220623-125547-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30013 and previous config saved to /var/cache/conftool/dbconfig/20220623-125125-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30012 and previous config saved to /var/cache/conftool/dbconfig/20220623-125120-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30011 and previous config saved to /var/cache/conftool/dbconfig/20220623-124049-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30010 and previous config saved to /var/cache/conftool/dbconfig/20220623-124043-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30009 and previous config saved to /var/cache/conftool/dbconfig/20220623-123621-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30008 and previous config saved to /var/cache/conftool/dbconfig/20220623-123616-root.json
* 12:26 moritzm: installing waitress security updates
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30007 and previous config saved to /var/cache/conftool/dbconfig/20220623-122545-root.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30006 and previous config saved to /var/cache/conftool/dbconfig/20220623-122539-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30005 and previous config saved to /var/cache/conftool/dbconfig/20220623-122118-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30004 and previous config saved to /var/cache/conftool/dbconfig/20220623-122112-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30003 and previous config saved to /var/cache/conftool/dbconfig/20220623-121041-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30002 and previous config saved to /var/cache/conftool/dbconfig/20220623-121035-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30001 and previous config saved to /var/cache/conftool/dbconfig/20220623-120614-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30000 and previous config saved to /var/cache/conftool/dbconfig/20220623-120608-root.json
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29999 and previous config saved to /var/cache/conftool/dbconfig/20220623-115537-root.json
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29998 and previous config saved to /var/cache/conftool/dbconfig/20220623-115532-root.json
* 11:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29997 and previous config saved to /var/cache/conftool/dbconfig/20220623-115110-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29996 and previous config saved to /var/cache/conftool/dbconfig/20220623-115104-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128 db1129 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29995 and previous config saved to /var/cache/conftool/dbconfig/20220623-114159-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29994 and previous config saved to /var/cache/conftool/dbconfig/20220623-114033-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29993 and previous config saved to /var/cache/conftool/dbconfig/20220623-114028-root.json
* 11:32 kart_: Updated cxserver to 2022-06-23-052732-production ([[phab:T311196|T311196]])
* 11:31 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 11:31 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 11:30 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 11:29 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 11:28 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 11:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29992 and previous config saved to /var/cache/conftool/dbconfig/20220623-112529-root.json
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29991 and previous config saved to /var/cache/conftool/dbconfig/20220623-112524-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 es1024 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29990 and previous config saved to /var/cache/conftool/dbconfig/20220623-110804-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29989 and previous config saved to /var/cache/conftool/dbconfig/20220623-105333-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29988 and previous config saved to /var/cache/conftool/dbconfig/20220623-105326-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29987 and previous config saved to /var/cache/conftool/dbconfig/20220623-105320-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29986 and previous config saved to /var/cache/conftool/dbconfig/20220623-103829-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29985 and previous config saved to /var/cache/conftool/dbconfig/20220623-103822-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29984 and previous config saved to /var/cache/conftool/dbconfig/20220623-103816-root.json
* 10:25 jayme: running restart-php7.2-fpm A:parsoid or A:mw or A:mw-api to disable opcache revalidation - [[phab:T266055|T266055]]
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29983 and previous config saved to /var/cache/conftool/dbconfig/20220623-102325-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29982 and previous config saved to /var/cache/conftool/dbconfig/20220623-102318-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29981 and previous config saved to /var/cache/conftool/dbconfig/20220623-102312-root.json
* 10:21 XioNoX: fix eqiad lvs switch port MTU
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29980 and previous config saved to /var/cache/conftool/dbconfig/20220623-100822-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29979 and previous config saved to /var/cache/conftool/dbconfig/20220623-100815-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29978 and previous config saved to /var/cache/conftool/dbconfig/20220623-100808-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29977 and previous config saved to /var/cache/conftool/dbconfig/20220623-095318-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29976 and previous config saved to /var/cache/conftool/dbconfig/20220623-095311-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29975 and previous config saved to /var/cache/conftool/dbconfig/20220623-095304-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29973 and previous config saved to /var/cache/conftool/dbconfig/20220623-093814-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29972 and previous config saved to /var/cache/conftool/dbconfig/20220623-093807-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29971 and previous config saved to /var/cache/conftool/dbconfig/20220623-093800-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29970 and previous config saved to /var/cache/conftool/dbconfig/20220623-092310-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29969 and previous config saved to /var/cache/conftool/dbconfig/20220623-092303-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29968 and previous config saved to /var/cache/conftool/dbconfig/20220623-092256-root.json
* 09:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 db1179 db1180 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29967 and previous config saved to /var/cache/conftool/dbconfig/20220623-090842-root.json
* 09:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:52 joal@deploy1002: Finished deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs (duration: 00m 08s)
* 08:52 joal@deploy1002: Started deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs
* 08:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 9 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 7 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 7 hosts with reason: Reboots
* 07:39 moritzm: installing firejail security updates
* 07:36 TheresNoTime: UTC morning deploys done
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:25 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806365{{!}}GrowthExperiments: Enable link recommendations frontend, round 4 (T304548)]] (duration: 03m 37s)
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: Reboots
* 00:35 brennen: end of phabricator maintenance window
* 00:13 brennen: phabricator deploy finished ([[phab:T311175|T311175]])
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 00:00 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance


== 2021-03-25 ==
== 2022-06-22 ==
* 23:47 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/3D/package.json: No-op demo sync (duration: 01m 07s)
* 22:56 tzatziki: removing 1 file for legal compliance
* 23:37 stran@deploy1002: Synchronized README: (no justification provided) (duration: 01m 06s)
* 21:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
* 23:20 jhuneidi@deploy1002: Synchronized README: [[gerrit:674984{{!}}DEMO: README]] (duration: 01m 07s)
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 to resolve Old GC Hell alert
* 22:59 brennen: no patches for upcoming deploy window, but we'll be conducting a deployment training using DEMO patches to READMEs.
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad to resolve Old GC Hell alert
* 22:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
* 21:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1006.eqiad.wmnet with OS bullseye
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:49 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44] (duration: 01m 18s)
* 21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:48 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44]
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
* 21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 21:27 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:27 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 19:48 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 and 2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 20:24 cjming: end of UTC late backport window
* 19:37 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.35 - [[phab:T274940|T274940]]
* 20:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 19:36 hashar@deploy1002: sync-wikiversions aborted: (no justification provided) (duration: 00m 03s)
* 20:19 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44] (duration: 07m 36s)
* 19:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:04 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 08s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce7d2d7a51bd2e3717b4de7b2f7e8ae427c221ad}}: ruwiki: flaggedrevs: Delete autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 06s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:59 Urbanecm: `mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' ` finished ([[phab:T275337|T275337]])
* 20:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807593{{!}}gawiki: Change category collation from `uppercase` to `uca-ga-u-kn` (T311136)]] (duration: 03m 39s)
* 18:53 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sturm . # [[phab:T278391|T278391]]
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS bullseye
* 18:50 Urbanecm: mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' # [[phab:T275337|T275337]]
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44]
* 18:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|39cd4f15a3900783ac0e9a213004a28f18298a23}}: ruwiki: flaggedrevs: Do not allow sysops to modify users in autoeditor group ([[phab:T275337|T275337]]) (duration: 01m 09s)
* 20:11 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44] (duration: 00m 07s)
* 18:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dcfb7feaace1f397169e5e1bab7efd4e5f605a0f}}: ruwiki: flaggedrevs: Do not remove autoreview group ([[phab:T275337|T275337]]) (duration: 01m 14s)
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44]
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|3fb664682bea3c4d1448b0937f938e810268bac3}}: ruwiki: flaggedrevs: Revoke review from sysop group ([[phab:T275811|T275811]]) (duration: 01m 06s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:29 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (3/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 20:10 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44] (duration: 06m 16s)
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (2/3; [[phab:T275819|T275819]]) (duration: 01m 06s)
* 20:03 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44]
* 18:26 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|29660f9ae8468aac1578b2905606ba9dd41d095f}}: Update altwiki logo (1/3; [[phab:T275819|T275819]]) (duration: 01m 10s)
* 20:03 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44] (duration: 30m 58s)
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|62be4e738a4fd45256027bb09b010ab152f19850}}: Disable magic links on enwiki ([[phab:T275951|T275951]]) (duration: 01m 20s)
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 18:14 mutante: alert1001 - sudo systemctl restart tcpircbot-logmsgbot
* 19:42 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 18:09 marxarelli: scap sync-file .pipeline Config: [[gerrit:674132{{!}}Include patches in restricted image (T271274)]]
* 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection (duration: 02m 03s)
* 18:06 hnowlan: draining and restarting aqs1004-b cassandra
* 19:37 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection
* 17:45 hnowlan: draining and restarting aqs1004-a cassandra
* 19:32 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44]
* 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 19:31 aqu: Deploying analytics/refinery (weekly train)
* 17:14 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 17:08 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 19:14 herron: bounced apache on lists1001
* 16:39 hashar: Restarted Apache 2 on contint2001 / contint1001
* 19:06 hashar: Restarting CI Jenkins
* 16:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 16:46 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1009.eqiad.wmnet with OS bullseye
* 16:32 moritzm: restarting apache on an-tool1007/turnilo
* 16:45 hashar: Restarting CI Jenkins
* 16:27 moritzm: restarting dnsdist/rdns-recursor on malmok
* 16:43 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
* 16:24 jbond42: restart slapd on ldap-replica
* 16:33 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 16:22 jbond42: restart slapd on ldap-corp
* 16:29 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 16:20 jbond42: restart apache on lists1002
* 16:18 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 16:18 jbond42: restart apache on netbox
* 16:14 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 16:13 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Disallow negative or decimal values in pages tag - [[phab:T278400|T278400]] (duration: 01m 32s)
* 16:13 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 16:12 jbond42: restart routinator on rpki*
* 16:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 16:12 moritzm: restarting nginx on apt*
* 16:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 16:10 moritzm: restarting apache on dbmonitor
* 16:08 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 16:08 moritzm: restart Apacge on matomo/piwik
* 16:06 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 16:03 jbond42: restart apache service on gerrit
* 16:05 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 16:02 jbond42: restart idp service
* 16:04 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 16:01 ema: A:cp rolling ats-<nowiki>{</nowiki>tls,backend<nowiki>}</nowiki>-restart for openssl upgrades -- https://www.openssl.org/news/secadv/20210325.txt
* 15:36 moritzm: upload jenkins 2.332.4 to apt.wikimedia.org [[phab:T311068|T311068]]
* 15:45 moritzm: installing openssl updates on buster
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:48 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:45 herron@cumin1001: START - Cookbook sre.dns.netbox
* 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:13 twentyafterfour: update phabricator again (last night's update undid a hotfix that is now fixed properly)
* 15:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:45 moritzm: drain ganeti1009
* 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
* 13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
* 13:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
* 13:27 moritzm: reduce webperf1001/webperf2001 to 4G RAM (xhgui has been split off to separate VMs)
* 15:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
* 13:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
* 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 13:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
* 15:00 jayme: published docker-registry.discovery.wmnet/helm-state-metrics:0.1.0-1 - [[phab:T310714|T310714]]
* 12:52 hnowlan: aqs1004 nodetool-a cleanup finished
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 12:14 moritzm: drain ganeti1008
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 12:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1007.eqiad.wmnet
* 14:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1007.eqiad.wmnet
* 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 11:52 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:674861{{!}}Disable Legacy javascript in fawikiquote]] ([[phab:T72470|T72470]]) (duration: 01m 07s)
* 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 11:46 moritzm: drain ganeti1007
* 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 11:44 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/skins/Vector/resources: [[gerrit:674382{{!}}Inform anonymous A/B test by tracking time from navigationStart (T275807)]] (duration: 01m 09s)
* 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 11:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
* 14:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
* 11:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
* 14:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
* 11:33 ladsgroup@deploy1002: Synchronized dblists/: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]], Part II ([[phab:T278369|T278369]]) (duration: 01m 07s)
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:32 ladsgroup@deploy1002: Synchronized wmf-config: [[gerrit:674857{{!}}tawiki: Enable Growth features in dark mode]] ([[phab:T278369|T278369]]) (duration: 01m 30s)
* 14:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
* 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
* 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
* 14:09 Lucas_WMDE: UTC afternoon backport+config window done
* 11:10 moritzm: drain ganeti1006
* 14:09 lucaswerkmeister-wmde@deploy1002: Synchronized logos/manage.py: Config: [[gerrit:807486{{!}}logos: Update phpcs comment]] (should be a no-op but syncing just in case) (duration: 03m 19s)
* 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
* 14:04 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:01 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/%s\n' specieswiki<nowiki>{</nowiki>,-<nowiki>{</nowiki>1.5,2<nowiki>}</nowiki>x<nowiki>}</nowiki>.png {{!}} mwscript purgeList.php # [[phab:T310961|T310961]]
* 10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:01 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (3/3) (duration: 03m 30s)
* 10:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (2/3) (duration: 03m 29s)
* 10:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
* 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:42 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:56 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
* 10:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
* 13:55 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
* 10:36 hnowlan: running general nodetool cleanup on aqs1004-a
* 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/project-logos/: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (1/3) (duration: 03m 46s)
* 10:35 hnowlan: running cleanup on aqs1004-a: nodetool-a cleanup "local_group_default_T_pageviews_per_project_v2" data
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:34 moritzm: drain ganeti1005
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
* 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:28 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 13:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
* 10:24 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:23 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 13:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
* 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:18 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:17 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (2/2) (duration: 03m 39s)
* 10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (1/2) (duration: 03m 35s)
* 09:26 moritzm: drain ganeti2024
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
* 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
* 09:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:45 moritzm: drain ganeti2023
* 13:28 XioNoX: fix MTU on eqiad server facing switch ports
* 08:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:12 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2 for buster-wikimedia
* 13:27 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
* 08:11 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:41 legoktm: upgraded lists1002 to hyperkitty 1.2.2-1+wmf1 ([[phab:T276687|T276687]])
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:36 legoktm: uploaded hyperkitty 1.2.2-1+wmf1 to buster-wikimedia ([[phab:T276687|T276687]])
* 13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:807255{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (2/3) (T304328)]] (duration: 03m 35s)
* 07:35 jynus: restart db2135 [[phab:T278408|T278408]] [[phab:T273281|T273281]]
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:05 effie: enable puppet on all mediawiki servers
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 06:57 XioNoX: Option 82: use-vlan-id
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:53 effie: enable puppet on jobrunners
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 06:47 effie: enable puppet on parsoid
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 06:40 effie: disable puppet on all mediawiki servers to merge 673061 (service proxy to listen on ::1)
* 13:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
* 06:23 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 05:19 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807254{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (1/3) (T304328)]] (duration: 03m 35s)
* 04:44 legoktm: restarted exim4 on lists1002 so it listens on 0.0.0.0 instead of 127.0.0.1
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 04:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 03:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 13:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
* 01:33 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 13:10 XioNoX: fix MTU in drmrs
* 01:10 legoktm: mailman3: added lists-next.wikimedia.org domain
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:08 legoktm: mailman3: renamed default site from "example.com" to "lists-next.wikimedia.org"
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:807211{{!}}[wmf-config]: Deploy GDI Survey Wave 2 - BETA (T311079)]] (duration: 03m 29s)
* 00:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
* 12:58 XioNoX: fix MTU on codfw switches access ports
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
* 12:57 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2777.codfw.wmnet
* 12:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
* 00:34 mutante: mw2377, mw2378 - first scap pull
* 12:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2378.codfw.wmnet
* 12:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
* 00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2377.codfw.wmnet
* 12:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2378.codfw.wmnet
* 12:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2377.codfw.wmnet
* 12:23 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
* 00:29 legoktm: syncing facts for puppet-compiler
* 12:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 00:23 mutante: mw2377, mw2378 - reboot
* 12:18 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 00:14 twentyafterfour: phabricator update complete
* 12:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 00:10 twentyafterfour: deploying phabricator
* 12:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 00:05 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_eqiad "eqiad cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T23:55:35` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 12:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 12:02 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
* 11:46 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
* 11:41 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 11:11 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 20s)
* 11:10 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 11:09 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 11s)
* 11:08 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 11:07 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 02m 54s)
* 11:05 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 10:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
* 10:53 jayme: systemctl restart rsyslog on kubernetes2008
* 10:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
* 10:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
* 10:41 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
* 10:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
* 10:36 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
* 10:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
* 10:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
* 10:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
* 10:17 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
* 10:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
* 10:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
* 10:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
* 10:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
* 10:06 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti-test2003.codfw.wmnet
* 10:04 moritzm: installing vim security updates
* 09:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 09:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
* 09:35 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 09:35 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 09:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
* 09:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
* 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
* 08:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
* 08:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29964 and previous config saved to /var/cache/conftool/dbconfig/20220622-084234-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29963 and previous config saved to /var/cache/conftool/dbconfig/20220622-084225-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29962 and previous config saved to /var/cache/conftool/dbconfig/20220622-084206-root.json
* 08:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29961 and previous config saved to /var/cache/conftool/dbconfig/20220622-082730-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29960 and previous config saved to /var/cache/conftool/dbconfig/20220622-082721-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29959 and previous config saved to /var/cache/conftool/dbconfig/20220622-082702-root.json
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
* 08:18 marostegui: Upgrade kernel and reboot on db[1111,1132,1143,1127].eqiad.wmnet
* 08:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
* 08:15 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]] (duration: 03m 43s)
* 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29957 and previous config saved to /var/cache/conftool/dbconfig/20220622-081227-root.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29956 and previous config saved to /var/cache/conftool/dbconfig/20220622-081217-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29955 and previous config saved to /var/cache/conftool/dbconfig/20220622-081159-root.json
* 08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 08:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
* 08:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
* 08:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
* 08:04 hashar: Updating operations-puppet-tests-buster-docker Jenkins job to use the latest Docker image (rebuild to catch up with latest defined gems). https://gerrit.wikimedia.org/r/c/integration/config/+/807478
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29954 and previous config saved to /var/cache/conftool/dbconfig/20220622-075721-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29953 and previous config saved to /var/cache/conftool/dbconfig/20220622-075713-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29952 and previous config saved to /var/cache/conftool/dbconfig/20220622-075655-root.json
* 07:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
* 07:53 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
* 07:50 marostegui: Upgrade kernel and reboot on db[2145-2150].codfw.wmnet
* 07:49 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29951 and previous config saved to /var/cache/conftool/dbconfig/20220622-074217-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29950 and previous config saved to /var/cache/conftool/dbconfig/20220622-074209-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29949 and previous config saved to /var/cache/conftool/dbconfig/20220622-074151-root.json
* 07:40 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 07:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
* 07:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29948 and previous config saved to /var/cache/conftool/dbconfig/20220622-072714-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29947 and previous config saved to /var/cache/conftool/dbconfig/20220622-072705-root.json
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29946 and previous config saved to /var/cache/conftool/dbconfig/20220622-072647-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29945 and previous config saved to /var/cache/conftool/dbconfig/20220622-071210-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29944 and previous config saved to /var/cache/conftool/dbconfig/20220622-071201-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29943 and previous config saved to /var/cache/conftool/dbconfig/20220622-071143-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1027 es1026 es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29942 and previous config saved to /var/cache/conftool/dbconfig/20220622-065507-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Switchover es1, es2 and es3 masters', diff saved to https://phabricator.wikimedia.org/P29941 and previous config saved to /var/cache/conftool/dbconfig/20220622-065208-marostegui.json
* 05:52 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 01:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:17 tstarling@deploy1002: Synchronized wmf-config/mc-labs.php: for completeness (duration: 03m 41s)
* 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:13 tstarling@deploy1002: Synchronized wmf-config/mc.php: g 807158 [[phab:T278392|T278392]] (duration: 03m 35s)
* 01:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-03-24 ==
== 2022-06-21 ==
* 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 20:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b42e57d75ec6b0536493fa073805a0bcb066aef1}}: zhwikiquote: Disable local upload ([[phab:T311017|T311017]]) (duration: 03m 43s)
* 23:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:48 mutante: generating new mcrouter certs for mw2377, mw2378
* 20:22 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (2/2) (duration: 03m 28s)
* 22:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:07 legoktm: disabled puppet on lists1002 while mailman3-web is broken
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:19 mutante: webperf2001 - restarted apache
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:11 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 07s)
* 20:18 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (1/2) (duration: 03m 30s)
* 21:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f70e302e11756d9704acc86c45b3d7aabf31c4d}}: fawiktionary: Enable SandboxLink extension ([[phab:T308505|T308505]]) (duration: 03m 37s)
* 21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:07 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GrowthExperiments: LinkRecommendation: Modify path args for calls to API - [[phab:T277865|T277865]] (duration: 01m 07s)
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:05 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Revert "Add default TemplateStyles for an Index" - [[phab:T278379|T278379]] (duration: 01m 07s)
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 19:38 dancy@deploy1002: backport aborted: (duration: 00m 10s)
* 21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:38 dancy@deploy1002: Installation of scap version "4.9.5" completed for 558 hosts
* 21:02 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GlobalUsage: Fix hook registration after class was namespaced - [[phab:T278375|T278375]] (duration: 01m 07s)
* 19:38 dancy@deploy1002: Installing scap version "4.9.5" for 558 hosts
* 20:59 hashar@deploy1002: Synchronized wmf-config/env.php: multiversion: Move '@' operator in env.php closer to relevant statement (duration: 01m 07s)
* 19:22 urandom: replicating Cassandra `system_auth` keyspace to codfw -- [[phab:T307641|T307641]]
* 20:56 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:56 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2` failed due to syntax error, patch here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/807200
* 20:30 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:48 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2`
* 20:26 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1001.wikimedia.org
* 20:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:13 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 20:10 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp1001.wikimedia.org
* 20:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2001.wikimedia.org
* 20:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 17:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host elastic1049.eqiad.wmnet
* 19:59 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 19:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp2001.wikimedia.org
* 19:57 ryankemper: [[phab:T267927|T267927]] Host key is missing for `wdqs2008` leading to `data-transfer` cookbook failing, looking into resolving
* 17:14 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 19:55 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:09 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1049.eqiad.wmnet
* 19:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 19:50 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:01 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 19:50 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 16:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
* 19:49 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 19:49 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 16:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 19:45 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 16:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
* 19:45 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
* 19:42 ryankemper: [[phab:T267927|T267927]] Re-enabledpuppet on `wdqs2008` and ran puppet agent
* 15:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 19:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 15:55 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2048.codfw.wmnet
* 19:14 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 to 1.36.0-wmf.35
* 15:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
* 19:07 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 21s)
* 15:52 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
* 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
* 15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 19:03 urbanecm@deploy1002: Synchronized wmf-config/config/shwiki.yaml: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 3/3) (duration: 01m 08s)
* 15:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 19:02 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 2/3) (duration: 01m 06s)
* 15:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (2/2) (duration: 03m 28s)
* 19:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0f3aa7278d17c88f27b7d58ceede82730fd4ddcd}}: shwiki: Enable Growth features in dark mode ([[phab:T278240|T278240]]; 1/3) (duration: 01m 07s)
* 15:37 klausman: restarting pybal on lvs2009
* 18:54 urbanecm@deploy1002: Synchronized wmf-config/config/eswiki.yaml: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 3/3) (duration: 01m 06s)
* 15:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
* 18:53 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 2/3) (duration: 01m 07s)
* 15:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 18:52 urbanecm@deploy1002: sync-file aborted: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode (2/3) (duration: 00m 01s)
* 15:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (1/2) (duration: 03m 51s)
* 18:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ced092071a9638d1e1c04602bd5bbed5cc3812e3}}: Enable Growth features on eswiki in dark mode ([[phab:T278235|T278235]]; 1/3) (duration: 01m 08s)
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:49 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:45 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:42 legoktm@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:30 klausman: Restarting pybal on lvs2010
* 18:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5aa050602954a3cab0c7e0c4b10efb0f957efb59}}: Promote several Growth target wikis out of dark mode ([[phab:T277491|T277491]]; [[phab:T276830|T276830]]; [[phab:T276123|T276123]]; [[phab:T276816|T276816]]; [[phab:T275550|T275550]]; [[phab:T276450|T276450]]) (duration: 01m 08s)
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|333393dfe59deb0ec4d7df6dd92372a705f65b85}}: Add autopatrol to autoreviewers in en.wikibooks ([[phab:T278300|T278300]]) (duration: 01m 09s)
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 18:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2002.codfw.wmnet
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2001.codfw.wmnet
* 17:25 effie: upgrade memcached on mc-gp* hosts
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging-ctrl2002.codfw.wmnet
* 15:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 15:42 moritzm: reduce RAM for irc2001 to 2G, was originally created with 8 G [[phab:T224579|T224579]]
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 15:35 effie: enable puppet on all mediawiki + memcached hosts
* 15:16 klausman@cumin1001: conftool action : help; selector: name=ml-staging2001
* 15:20 moritzm: drain ganeti2022
* 15:15 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
* 15:06 moritzm: installing avahi security updates
* 14:35 moritzm: drain ganeti2021
* 15:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:31 effie: disable puppet on all mediawiki servers + memcached for 674290
* 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:05 moritzm: failover Ganeti master in codfw to ganeti2019
* 15:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
* 15:01 papaul: PDU swap for rack a2 complete
* 13:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
* 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:29 moritzm: installing irc1001
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:15 moritzm: drain ganeti2020
* 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
* 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
* 14:24 papaul: on going maintenance on ps1-a2-codfw
* 12:28 effie: enabling puppet on mediawiki and memcached servers
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:10 jynus: restart dbprov200[12] [[phab:T271913|T271913]]
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15076 and previous config saved to /var/cache/conftool/dbconfig/20210324-115940-root.json
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:57 Andrew-WMDE_: EU deploys done
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
* 11:53 jynus: restart dbprov100[12] [[phab:T271913|T271913]]
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:51 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/MassMessage/: Backport: [[gerrit:674367{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 08s)
* 13:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 11:49 effie: disable puppet on all hosts running mediawiki+memcached to merge 674282
* 13:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
* 11:45 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/MassMessage/: Backport: [[gerrit:674366{{!}}MassMessage: Unbreak remote content fetching (T276936)]] (duration: 01m 07s)
* 13:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15075 and previous config saved to /var/cache/conftool/dbconfig/20210324-114436-root.json
* 13:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15074 and previous config saved to /var/cache/conftool/dbconfig/20210324-112932-root.json
* 13:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
* 11:22 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673326{{!}}Enable CodeMirror accessibility colors on initial wikis (T276346)]] (duration: 01m 08s)
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:15 jynus: restart serially db2097 db2098 db2099 db2100 [[phab:T271913|T271913]]
* 13:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
* 11:14 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:673312{{!}}Enable bracket matching on group0 and wikitech (T273591)]] (duration: 01m 25s)
* 13:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15073 and previous config saved to /var/cache/conftool/dbconfig/20210324-111429-root.json
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:50 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1001.wikimedia.org
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:28 daniel@deploy1002: Synchronized rpc/: Config: [[gerrit:805775{{!}}rpc: Remove unused RunJobs.php (T175146 T243096)]] (duration: 03m 45s)
* 10:45 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:44 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
* 10:36 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host irc1001.wikimedia.org
* 13:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 10:31 jynus: restart db1171 [[phab:T271913|T271913]]
* 13:05 moritzm: installing Linux 5.10.120-1~bpo10+1 on buster hosts with backports kernel
* 10:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 10:14 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 10:14 jynus: restart db1145 [[phab:T271913|T271913]]
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
* 10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 12:56 moritzm: installing haproxy security updates on stretch
* 10:03 jynus: restart db1139 [[phab:T271913|T271913]]
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15072 and previous config saved to /var/cache/conftool/dbconfig/20210324-095655-marostegui.json
* 12:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15071 and previous config saved to /var/cache/conftool/dbconfig/20210324-095606-root.json
* 12:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
* 09:51 jynus: restart db1116 [[phab:T271913|T271913]]
* 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15070 and previous config saved to /var/cache/conftool/dbconfig/20210324-094102-root.json
* 12:43 moritzm: installing python-bottle security updates
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
* 09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15069 and previous config saved to /var/cache/conftool/dbconfig/20210324-092558-root.json
* 12:25 moritzm: reset logster-csp/logster-badpass-priv on mwlog1002, these were removed from Puppet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15068 and previous config saved to /var/cache/conftool/dbconfig/20210324-091055-root.json
* 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 08:29 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
* 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 08:16 gehel: restarting wdqs updater on all nodes for config change
* 12:06 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics-external
* 11:59 mbsantos: mbsantos@maps2009 imposm-removebackup-import ([[phab:T305845|T305845]])
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15066 and previous config saved to /var/cache/conftool/dbconfig/20210324-081057-root.json
* 11:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15065 and previous config saved to /var/cache/conftool/dbconfig/20210324-080725-root.json
* 11:44 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for schema change', diff saved to https://phabricator.wikimedia.org/P15064 and previous config saved to /var/cache/conftool/dbconfig/20210324-080223-marostegui.json
* 11:43 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-main
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for testing', diff saved to https://phabricator.wikimedia.org/P29936 and previous config saved to /var/cache/conftool/dbconfig/20220621-114232-root.json
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-logging-external
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for testing', diff saved to https://phabricator.wikimedia.org/P29935 and previous config saved to /var/cache/conftool/dbconfig/20220621-114216-root.json
* 08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for testing', diff saved to https://phabricator.wikimedia.org/P29934 and previous config saved to /var/cache/conftool/dbconfig/20220621-114151-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15063 and previous config saved to /var/cache/conftool/dbconfig/20210324-075553-root.json
* 10:57 volans: deleting netbox getstats.GetDeviceStats job results - [[phab:T311048|T311048]]
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15062 and previous config saved to /var/cache/conftool/dbconfig/20210324-075221-root.json
* 10:51 kart_: Updated cxserver to 2022-06-21-035954-production ([[phab:T307970|T307970]])
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-main
* 10:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-logging-external
* 10:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=zotero
* 10:47 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 07:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
* 10:47 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15061 and previous config saved to /var/cache/conftool/dbconfig/20210324-074050-root.json
* 10:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15060 and previous config saved to /var/cache/conftool/dbconfig/20210324-073718-root.json
* 10:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 07:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
* 10:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P15059 and previous config saved to /var/cache/conftool/dbconfig/20210324-072319-marostegui.json
* 09:31 urbanecm: 09:29:23 Synchronized wmf-config/throttle.php: {{Gerrit|7c9f6a561b2b4b5c5db063bad83bd23e9cbac347}}: Add a throttle rule for a Czech course ([[phab:T310885|T310885]]) (duration: 03m 34s) #manually logging in logmsgbot's absence
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15058 and previous config saved to /var/cache/conftool/dbconfig/20210324-072214-root.json
* 09:20 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 07:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ml-etcd2002.codfw.wmnet
* 09:13 marostegui: dbmaint s8@codfw [[phab:T310011|T310011]]
* 07:10 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts ml-etcd2002.codfw.wmnet
* 08:29 marostegui: Reboot db1120 for kernel upgrade
* 07:09 moritzm: installing squid security updates
* 08:14 moritzm: remove EOLed parsoid debs from releases.wikimedia.org [[phab:T309765|T309765]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1181 to dbctl, depooled [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15057 and previous config saved to /var/cache/conftool/dbconfig/20210324-063459-marostegui.json
* 05:54 marostegui: Reboot db1132 and db1181 for kernel upgrade
* 06:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1084.eqiad.wmnet
* 06:14 root@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1084.eqiad.wmnet
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P15056 and previous config saved to /var/cache/conftool/dbconfig/20210324-055246-marostegui.json
* 04:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 03:41 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 03:41 ryankemper: [[phab:T274204|T274204]] Restarting `codfw` restart; the timestamp argument should prevent it from wasting time on nodes that have been rebooted already
* 03:40 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 03:39 ryankemper: [[phab:T274204|T274204]] Timed out waiting for write queues to empty: `[59/60, retrying in 60.00s] Attempt to run 'spicerack.elasticsearch_cluster.ElasticsearchClusters.wait_for_all_write_queues_empty' raised: Write queue not empty (had value of 241631) for partition 0 of topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite.`
* 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
* 02:38 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 02:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 01:59 ryankemper: [[phab:T274204|T274204]] For now I'll proceed to the reboots of `codfw`
* 01:59 ryankemper: [[phab:T274204|T274204]] `ctrl+c`'d out of run; relforge is relying on outdated config that is trying to talk to `relforge1002` which no longer exists. Need to refactor so that config no longer lives in spicerack
* 01:58 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade-reboot (exit_code=97)
* 01:49 ryankemper: [[phab:T274204|T274204]] `sudo -i cookbook sre.elasticsearch.rolling-upgrade-reboot relforge "relforge cluster restarts" --task-id [[phab:T274204|T274204]] --nodes-per-run 3 --start-datetime 2021-03-24T01:45:59+00:00` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
* 01:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade-reboot
* 01:36 eileen: civicrm revision changed from {{Gerrit|f36a0b08f0}} to {{Gerrit|ad430721f6}}, config revision is {{Gerrit|26b02db7ba}}
* 00:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 00:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
* 00:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE


== 2021-03-23 ==
== 2022-06-20 ==
* 22:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 07:14 SandraEbele: Started Airflow 3 Wikidata metrics jobs (Articleplaceholder, Reliability and SpecialEntityData metrics).
* 22:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
* 07:14 SandraEbele: killed Oozie wikidata-articleplaceholder_metrics-coord, wikidata-reliability_metrics-coord, and wikidata-specialentitydata_metrics-coord jobs.
* 22:33 dwisehaupt: pushing {{Gerrit|60f9baaf50b}} to fundraising hosts which will enable ssl by default for mysql client connections that use the host my.cnf file - [[phab:T170321|T170321]]
* 22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace (duration: 02m 07s)
* 22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace
* 22:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:05 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 21:27 ppchelko@deploy1002: Finished deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint (duration: 17m 58s)
* 21:09 ppchelko@deploy1002: Started deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint
* 21:04 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:41 eileen: civicrm revision changed from {{Gerrit|39d24e8b0a}} to {{Gerrit|f36a0b08f0}}, config revision is {{Gerrit|26b02db7ba}}
* 20:24 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:24 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:21 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts auth1002.eqiad.wmnet
* 20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
* 20:02 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts auth1002.eqiad.wmnet
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
* 19:51 jforrester@deploy1002: Finished deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans (duration: 00m 08s)
* 19:51 jforrester@deploy1002: Started deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans
* 18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove schema overrides for 6 finished EL migrations - [[phab:T267347|T267347]] [[phab:T271164|T271164]] [[phab:T267351|T267351]] [[phab:T267348|T267348]] [[phab:T267343|T267343]] [[phab:T267353|T267353]] (duration: 01m 07s)
* 18:40 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/vendor/: Bump wikimedia/parsoid to 0.13.0-a29 (duration: 01m 16s)
* 18:20 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:18 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:16 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:10 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add irc2001.wikimedia.org (running buster) as second irc server ([[phab:T224579|T224579]]) (duration: 01m 08s)
* 15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:32 moritzm: installing libsdl2 security updates
* 15:31 akosiaris: pool echostore for eqiad (the first of the larger services traffic wise)
* 15:31 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=echostore
* 15:25 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete ([[phab:T274200|T274200]])
* 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:43 akosiaris: pool more services in eqiad k8s. [[phab:T277741|T277741]]. Only the very large ones traffic wise are still on codfw
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=recommendation-api
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=push-notifications
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=proton
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mobileapps
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=linkrecommendation
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams-internal
* 14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams
* 14:20 akosiaris: pool a few more services in eqiad k8s. [[phab:T277741|T277741]]
* 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=wikifeeds
* 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=termbox
* 14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=similar-users
* 14:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36
* 14:06 akosiaris: pool a few services in eqiad k8s. [[phab:T277741|T277741]]
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=cxserver
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=blubberoid
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=api-gateway
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apertium
* 14:05 moritzm: installing pygments security updates on stretch
* 14:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2008.codfw.wmnet
* 13:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2008.codfw.wmnet
* 13:55 hashar@deploy1002: Finished scap: Promote testwikis from 1.36.0-wmf.35 to 1.36.0-wmf.36 - [[phab:T274940|T274940]] (duration: 31m 57s)
* 13:54 elukey: sudo systemctl reload apache2 on prometheus[12]00[34] to pick up new k8s-mlserve instance settings
* 13:28 moritzm: drain ganeti2008
* 13:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
* 13:23 hashar@deploy1002: Started scap: Promote testwikis from 1.36.0-wmf.35 to 1.36.0-wmf.36 - [[phab:T274940|T274940]]
* 13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
* 13:15 ema: cp3054: install varnishkafka built explicitly against varnish 6.0.1-1wm2 to fix broken dpkg status [[phab:T264398|T264398]]
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 100%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15054 and previous config saved to /var/cache/conftool/dbconfig/20210323-130543-root.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15053 and previous config saved to /var/cache/conftool/dbconfig/20210323-130153-root.json
* 12:58 moritzm: drain ganeti2018
* 12:58 akosiaris: remove and decomission argon, chroline, acrab, acrux [[phab:T277741|T277741]], [[phab:T277191|T277191]]
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15052 and previous config saved to /var/cache/conftool/dbconfig/20210323-125155-root.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15051 and previous config saved to /var/cache/conftool/dbconfig/20210323-125039-root.json
* 12:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15050 and previous config saved to /var/cache/conftool/dbconfig/20210323-124650-root.json
* 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 85%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15049 and previous config saved to /var/cache/conftool/dbconfig/20210323-123651-root.json
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15048 and previous config saved to /var/cache/conftool/dbconfig/20210323-123535-root.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15047 and previous config saved to /var/cache/conftool/dbconfig/20210323-123146-root.json
* 12:27 moritzm: drain ganeti2017
* 12:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15046 and previous config saved to /var/cache/conftool/dbconfig/20210323-122148-root.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15045 and previous config saved to /var/cache/conftool/dbconfig/20210323-122032-root.json
* 12:17 akosiaris: remove all schedule downtimes for k8s cluster. [[phab:T277741|T277741]]
* 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15044 and previous config saved to /var/cache/conftool/dbconfig/20210323-121642-root.json
* 12:09 moritzm: drain ganeti2016
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 60%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15043 and previous config saved to /var/cache/conftool/dbconfig/20210323-120644-root.json
* 11:55 moritzm: installing libcaca security updates
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15042 and previous config saved to /var/cache/conftool/dbconfig/20210323-115141-root.json
* 11:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aqs[1012-1015].eqiad.wmnet with reason: New buster hosts, not in use
* 11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on aqs[1012-1015].eqiad.wmnet with reason: New buster hosts, not in use
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 35%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15041 and previous config saved to /var/cache/conftool/dbconfig/20210323-113637-root.json
* 11:31 Lucas_WMDE: EU backport&config window done
* 11:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:674098{{!}}Enable DiscussionTools' beta features on dewiki (T276494)]] (duration: 00m 58s)
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15040 and previous config saved to /var/cache/conftool/dbconfig/20210323-112133-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 20%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15039 and previous config saved to /var/cache/conftool/dbconfig/20210323-110630-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P15038 and previous config saved to /var/cache/conftool/dbconfig/20210323-110553-marostegui.json
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15037 and previous config saved to /var/cache/conftool/dbconfig/20210323-110347-root.json
* 11:01 moritzm: installing tomcat8 security updates
* 10:56 jayme: all services re-deployed to k8s eqiad - [[phab:T277741|T277741]]
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 15%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15036 and previous config saved to /var/cache/conftool/dbconfig/20210323-105126-root.json
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15035 and previous config saved to /var/cache/conftool/dbconfig/20210323-104843-root.json
* 10:46 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 10:46 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 10:43 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 10:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
* 10:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:41 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 10:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 10:37 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:37 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15034 and previous config saved to /var/cache/conftool/dbconfig/20210323-103623-root.json
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15033 and previous config saved to /var/cache/conftool/dbconfig/20210323-103340-root.json
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
* 10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:31 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:31 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:27 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:27 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:26 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 10:26 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 10:25 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:25 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:24 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:22 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubesvc
* 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 10:21 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:21 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 5%: Slowly pool db1165 into s6 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15031 and previous config saved to /var/cache/conftool/dbconfig/20210323-102119-root.json
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:19 hashar@deploy1002: Pruned MediaWiki: 1.36.0-wmf.33 (duration: 01m 48s)
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15030 and previous config saved to /var/cache/conftool/dbconfig/20210323-101836-root.json
* 10:16 hashar@deploy1002: Pruned MediaWiki: 1.36.0-wmf.32 (duration: 14m 47s)
* 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1005.eqiad.wmnet
* 10:02 hashar: scap clean --delete 1.36.0-wmf.32  # [[phab:T274940|T274940]]
* 10:01 hashar: Applied security patches for 1.36.0-wmf.36 # [[phab:T274940|T274940]]
* 09:57 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1006.eqiad.wmnet
* 09:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1015.eqiad.wmnet
* 09:54 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1006.eqiad.wmnet
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1165 into s6 with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15029 and previous config saved to /var/cache/conftool/dbconfig/20210323-095437-marostegui.json
* 09:54 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1016.eqiad.wmnet
* 09:53 akosiaris: deploy helmfile.d/admin_ng for eqiad [[phab:T277741|T277741]]
* 09:53 hashar: scap prep 1.36.0-wmf.36 # [[phab:T274940|T274940]]
* 09:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:53 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=kubesvc,name=kubernetes2017.codfw.wmnet
* 09:53 jayme@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=kubesvc,name=kubernetes2017.codfw.wmnet
* 09:51 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:50 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubesvc,name=kubernetes1017.eqiad.wmnet
* 09:50 jayme@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=kubesvc,name=kubernetes1017.eqiad.wmnet
* 09:49 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:46 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:46 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:45 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:45 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: REIMAGE
* 09:44 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 09:44 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 09:43 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: REIMAGE
* 09:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: REIMAGE
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1165 into s6 with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15028 and previous config saved to /var/cache/conftool/dbconfig/20210323-094257-marostegui.json
* 09:41 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: REIMAGE
* 09:41 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1016.eqiad.wmnet
* 09:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: REIMAGE
* 09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1015.eqiad.wmnet
* 09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1005.eqiad.wmnet
* 09:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: REIMAGE
* 09:38 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: REIMAGE
* 09:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: REIMAGE
* 09:36 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: REIMAGE
* 09:36 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1004.eqiad.wmnet with reason: REIMAGE
* 09:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: REIMAGE
* 09:34 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: REIMAGE
* 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1017.eqiad.wmnet
* 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: REIMAGE
* 09:32 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: REIMAGE
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1165 to dbctl, depooled - [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15027 and previous config saved to /var/cache/conftool/dbconfig/20210323-093246-marostegui.json
* 09:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: REIMAGE
* 09:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1004.eqiad.wmnet with reason: REIMAGE
* 09:30 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: REIMAGE
* 09:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1003.eqiad.wmnet with reason: REIMAGE
* 09:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1002.eqiad.wmnet with reason: REIMAGE
* 09:28 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: REIMAGE
* 09:28 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1003.eqiad.wmnet with reason: REIMAGE
* 09:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1002.eqiad.wmnet with reason: REIMAGE
* 09:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1001.eqiad.wmnet with reason: REIMAGE
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 to clone db1181 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15025 and previous config saved to /var/cache/conftool/dbconfig/20210323-092600-marostegui.json
* 09:24 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1001.eqiad.wmnet with reason: REIMAGE
* 09:18 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dc=eqiad,cluster=kubernetes,name=kubernetes1017.eqiad.wmnet
* 09:17 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubemaster,cluster=kubernetes
* 09:17 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=kubemaster,cluster=kubernetes
* 09:16 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1017.eqiad.wmnet
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P15024 and previous config saved to /var/cache/conftool/dbconfig/20210323-091432-marostegui.json
* 09:05 akosiaris: reboot kubetcd100[456] for kernel upgrades. [[phab:T277741|T277741]] [[phab:T273278|T273278]]
* 09:04 akosiaris: empty etcd [[phab:T277741|T277741]]
* 08:43 akosiaris: poweroff argon and chlorine [[phab:T277741|T277741]]
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15023 and previous config saved to /var/cache/conftool/dbconfig/20210323-083957-root.json
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=wikifeeds
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=termbox
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=similar-users
* 08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=recommendation-api
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=push-notifications
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=proton
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mobileapps
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=linkrecommendation
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams-internal
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams
* 08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-main
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-logging-external
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics-external
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=echostore
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=api-gateway
* 08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=apertium
* 08:33 akosiaris: eqiad services in k8s depooled. [[phab:T277741|T277741]]
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=wikifeeds
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=termbox
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=similar-users
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=recommendation-api
* 08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=push-notifications
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=proton
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mobileapps
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=linkrecommendation
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams-internal
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-main
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-logging-external
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics-external
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=echostore
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=api-gateway
* 08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=apertium
* 08:28 akosiaris: downtime all services in [[phab:T277741|T277741]] for 24H
* 08:25 akosiaris: beginning the k8s upgrade/reinit process. [[phab:T277741|T277741]]
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15022 and previous config saved to /var/cache/conftool/dbconfig/20210323-082454-root.json
* 08:24 moritzm: installing mariadb-10.3 updates on buster (just client-side libs/tools, unrelated to the main wmf-mariadb packages)
* 08:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize eqiad k8s cluster with new etcd
* 08:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize eqiad k8s cluster with new etcd
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15021 and previous config saved to /var/cache/conftool/dbconfig/20210323-082213-root.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15020 and previous config saved to /var/cache/conftool/dbconfig/20210323-080949-root.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15019 and previous config saved to /var/cache/conftool/dbconfig/20210323-080709-root.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15017 and previous config saved to /var/cache/conftool/dbconfig/20210323-075445-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15016 and previous config saved to /var/cache/conftool/dbconfig/20210323-075253-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15015 and previous config saved to /var/cache/conftool/dbconfig/20210323-075230-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15014 and previous config saved to /var/cache/conftool/dbconfig/20210323-075216-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15013 and previous config saved to /var/cache/conftool/dbconfig/20210323-075206-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15012 and previous config saved to /var/cache/conftool/dbconfig/20210323-073726-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15011 and previous config saved to /var/cache/conftool/dbconfig/20210323-073713-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15010 and previous config saved to /var/cache/conftool/dbconfig/20210323-073702-root.json
* 07:36 elukey: create a 50g lvm volume on prometheus[12]00[34] for the k8s-mlserve cluster - [[phab:T272918|T272918]]
* 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 100%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15009 and previous config saved to /var/cache/conftool/dbconfig/20210323-072352-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15008 and previous config saved to /var/cache/conftool/dbconfig/20210323-072223-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15007 and previous config saved to /var/cache/conftool/dbconfig/20210323-072209-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15006 and previous config saved to /var/cache/conftool/dbconfig/20210323-070849-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15005 and previous config saved to /var/cache/conftool/dbconfig/20210323-070719-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15004 and previous config saved to /var/cache/conftool/dbconfig/20210323-070705-root.json
* 07:02 marostegui: Upgrade kernel on db1101
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15003 and previous config saved to /var/cache/conftool/dbconfig/20210323-065947-marostegui.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15002 and previous config saved to /var/cache/conftool/dbconfig/20210323-065836-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15001 and previous config saved to /var/cache/conftool/dbconfig/20210323-065345-root.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15000 and previous config saved to /var/cache/conftool/dbconfig/20210323-063842-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14999 and previous config saved to /var/cache/conftool/dbconfig/20210323-062942-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 10%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P14998 and previous config saved to /var/cache/conftool/dbconfig/20210323-062338-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086', diff saved to https://phabricator.wikimedia.org/P14997 and previous config saved to /var/cache/conftool/dbconfig/20210323-062059-marostegui.json
* 06:20 marostegui: Upgrade kernel on db1086
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P14996 and previous config saved to /var/cache/conftool/dbconfig/20210323-060701-root.json
* 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1136 to s7 master and remove read-only from s7 [[phab:T274336|T274336]]', diff saved to https://phabricator.wikimedia.org/P14995 and previous config saved to /var/cache/conftool/dbconfig/20210323-060216-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s7 as read-only for maintenance [[phab:T274336|T274336]]', diff saved to https://phabricator.wikimedia.org/P14994 and previous config saved to /var/cache/conftool/dbconfig/20210323-060104-marostegui.json
* 06:00 marostegui: Starting s7 eqiad failover from db1086 to db1136 - [[phab:T274336|T274336]]
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1174 to api [[phab:T274336|T274336]]', diff saved to https://phabricator.wikimedia.org/P14993 and previous config saved to /var/cache/conftool/dbconfig/20210323-051346-marostegui.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1136 before failover [[phab:T274336|T274336]]', diff saved to https://phabricator.wikimedia.org/P14992 and previous config saved to /var/cache/conftool/dbconfig/20210323-051210-marostegui.json
* 00:07 tstarling@deploy1002: Synchronized wmf-config: use RequestTimeout library step 3: clean up (duration: 00m 58s)
* 00:06 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: use RequestTimeout library step 2: enable new system (duration: 00m 57s)
* 00:04 tstarling@deploy1002: Synchronized wmf-config/PhpAutoPrepend.php: use RequestTimeout library step 1: disable old request timeout system (duration: 00m 58s)


== 2021-03-22 ==
== 2022-06-19 ==
* 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 10:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 23:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2250.codfw.wmnet
* 10:14 ayounsi@cumin1001: dbctl commit (dc=all): 'depool', diff saved to https://phabricator.wikimedia.org/P29910 and previous config saved to /var/cache/conftool/dbconfig/20220619-101436-ayounsi.json
* 23:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:18 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: [[phab:T262612|T262612]]: Start glent m1 ab test (duration: 01m 53s)
* 23:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 23:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2250.codfw.wmnet
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2249.codfw.wmnet
* 22:52 mutante: decom mw2249
* 22:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2249.codfw.wmnet
* 21:08 sbassett: Deployed security patch for [[phab:T272244|T272244]]
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet,service=canary
* 20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet,service=canary
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2279.codfw.wmnet,service=canary
* 20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2278.codfw.wmnet,service=canary
* 19:50 mutante: gerrit2001 - restarted apache2 as well for consistency
* 19:47 mutante: gerrit - restarting apache2 after we dropped MaxClients config line. This should make us fall back to Debian default MaxRequestWorkers. (since we use event MPM we should not be using MaxClients in the first place, says #httpd) ([[phab:T277127|T277127]])
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|25247c9cbba3d3741908164f2d15fb8497ce8b5e}}: hrwiki: Configure mentorship for Growth team features ([[phab:T275684|T275684]]) (duration: 01m 00s)
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|951601f7a4c887f21e209b32dbd1cfd3da084816}}: Grant enwiki pagemovers the delete-redirect right ([[phab:T278131|T278131]]) (duration: 00m 59s)
* 17:30 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T274200|T274200]])
* 16:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:47 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:46 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14990 and previous config saved to /var/cache/conftool/dbconfig/20210322-155808-root.json
* 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14989 and previous config saved to /var/cache/conftool/dbconfig/20210322-154304-root.json
* 15:38 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14988 and previous config saved to /var/cache/conftool/dbconfig/20210322-152800-root.json
* 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14987 and previous config saved to /var/cache/conftool/dbconfig/20210322-151257-root.json
* 14:26 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:23 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:14 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P14986 and previous config saved to /var/cache/conftool/dbconfig/20210322-141146-marostegui.json
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14985 and previous config saved to /var/cache/conftool/dbconfig/20210322-140800-root.json
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad - [[phab:T277771|T277771]]
* 14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14984 and previous config saved to /var/cache/conftool/dbconfig/20210322-135256-root.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14983 and previous config saved to /var/cache/conftool/dbconfig/20210322-133753-root.json
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14982 and previous config saved to /var/cache/conftool/dbconfig/20210322-132249-root.json
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:16 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 12:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:20 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P14981 and previous config saved to /var/cache/conftool/dbconfig/20210322-121924-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14980 and previous config saved to /var/cache/conftool/dbconfig/20210322-112954-root.json
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14979 and previous config saved to /var/cache/conftool/dbconfig/20210322-112707-root.json
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 11:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 11:15 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14978 and previous config saved to /var/cache/conftool/dbconfig/20210322-111451-root.json
* 11:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14977 and previous config saved to /var/cache/conftool/dbconfig/20210322-111203-root.json
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14976 and previous config saved to /var/cache/conftool/dbconfig/20210322-105947-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14975 and previous config saved to /var/cache/conftool/dbconfig/20210322-105700-root.json
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:51 moritzm: installing libdbi-perl security updates
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14974 and previous config saved to /var/cache/conftool/dbconfig/20210322-104443-root.json
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14973 and previous config saved to /var/cache/conftool/dbconfig/20210322-104156-root.json
* 10:42 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:673979{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:17 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:12 elukey: run homer for cr1/cr2 eqiad and codfw to add new iBGP session for the k8s ML clusters - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/661055
* 09:50 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config cleanup (duration: 00m 57s)
* 09:49 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config cleanup (duration: 00m 59s)
* 09:48 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config cleanup (duration: 01m 20s)
* 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for schema change', diff saved to https://phabricator.wikimedia.org/P14971 and previous config saved to /var/cache/conftool/dbconfig/20210322-093558-marostegui.json
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14970 and previous config saved to /var/cache/conftool/dbconfig/20210322-091534-root.json
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14969 and previous config saved to /var/cache/conftool/dbconfig/20210322-090030-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14968 and previous config saved to /var/cache/conftool/dbconfig/20210322-084527-root.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14967 and previous config saved to /var/cache/conftool/dbconfig/20210322-083023-root.json
* 08:13 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]] [[phab:T268435|T268435]]
* 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 08:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 08:02 jayme: build and release docker-registry.discovery.wmnet/eventrouter:0.3.0-6, docker-registry.discovery.wmnet/fluent-bit:1.5.3-3, docker-registry.discovery.wmnet/ratelimit:1.5.1-s3
* 08:00 marostegui: Stop MySQL on db1085 to clone db1165 (lag will appear on s6 on wiki replicas) [[phab:T258361|T258361]]
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 to clone db1165', diff saved to https://phabricator.wikimedia.org/P14965 and previous config saved to /var/cache/conftool/dbconfig/20210322-080020-marostegui.json
* 07:51 elukey: stop/start mariadb instances on dbstore1004 to reduce buffer pool memory settings - [[phab:T273865|T273865]]
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14964 and previous config saved to /var/cache/conftool/dbconfig/20210322-073747-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14963 and previous config saved to /var/cache/conftool/dbconfig/20210322-072243-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change', diff saved to https://phabricator.wikimedia.org/P14962 and previous config saved to /var/cache/conftool/dbconfig/20210322-071430-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14961 and previous config saved to /var/cache/conftool/dbconfig/20210322-070740-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14960 and previous config saved to /var/cache/conftool/dbconfig/20210322-065236-root.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1084 from dbctl [[phab:T276302|T276302]]', diff saved to https://phabricator.wikimedia.org/P14959 and previous config saved to /var/cache/conftool/dbconfig/20210322-063732-marostegui.json
* 06:11 marostegui: Sanitize db1124 db2094 db1154: taywiki trvwiki mnwwiktionary
* 04:28 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .


== 2021-03-21 ==
== 2022-06-17 ==
* 10:25 _joe_: restarting gerrit on gerrit1001, using 45G of reserved memory
* 22:05 AndyRussG: update payments-wiki revision {{Gerrit|10304f69}} -> {{Gerrit|ef53c82e}}
* 09:22 elukey: install apache2-bin-dbgsym on gerrit1001 - [[phab:T277127|T277127]]
* 20:22 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P29908 and previous config saved to /var/cache/conftool/dbconfig/20220617-202240-jynus.json
* 08:50 qchris: Restarting apache on gerrit1001 again (all apache workers busy again) see [[phab:T277127|T277127]]
* 20:20 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P29907 and previous config saved to /var/cache/conftool/dbconfig/20220617-202038-jynus.json
* 08:18 qchris: Restarting apache on gerrit1001 (all apache workers busy)
* 17:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1021.eqiad.wmnet with OS buster
* 17:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 17:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 16:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1020.eqiad.wmnet with OS buster
* 16:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 16:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1019.eqiad.wmnet with OS buster
* 16:37 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 16:35 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 16:34 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 16:34 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 16:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 16:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 16:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 16:22 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 16:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 16:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 16:06 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 16:06 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
* 16:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 15:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 15:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
* 15:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
* 15:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
* 15:51 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 15:46 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS buster
* 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
* 15:39 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 15:39 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
* 15:32 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 15:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS buster
* 15:29 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
* 15:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
* 15:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
* 15:19 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 15:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 15:18 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 15:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 15:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 15:15 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS buster
* 15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
* 15:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
* 15:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 15:02 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
* 14:59 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
* 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 14:46 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 14:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 12:35 SandraEbele: deployed daily airflow dag for 3 Wikidata metrics.
* 11:54 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@18182aa]: (no justification provided) (duration: 00m 13s)
* 11:54 ebysans@deploy1002: Started deploy [airflow-dags/analytics@18182aa]: (no justification provided)
* 11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2012.codfw.wmnet
* 11:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2012.codfw.wmnet
* 11:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2011.codfw.wmnet
* 11:40 moritzm: upload cas 6.5.5+wmf11u1 to apt.wikimedia.org [[phab:T305518|T305518]]
* 11:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2011.codfw.wmnet
* 11:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2010.codfw.wmnet
* 11:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 11:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 11:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 11:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 11:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 11:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 11:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2010.codfw.wmnet
* 11:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1012.eqiad.wmnet
* 11:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1012.eqiad.wmnet
* 11:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1011.eqiad.wmnet
* 11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1010.eqiad.wmnet
* 11:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1010.eqiad.wmnet
* 10:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 10:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 10:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 10:34 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 10:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 10:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 10:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet
* 09:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet
* 09:56 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 09:56 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 09:55 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 09:55 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 09:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 09:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet
* 09:44 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet
* 09:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet
* 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1004.eqiad.wmnet
* 09:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1004.eqiad.wmnet
* 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
* 09:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet
* 09:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
* 09:24 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
* 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 09:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
* 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
* 09:11 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 09:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 09:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 08:58 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 08:51 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 08:47 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 08:39 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 08:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 08:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
* 08:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
* 08:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
* 08:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
* 07:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 07:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 02:51 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS bullseye
* 02:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:36 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:06 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 03m 43s)
* 02:02 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS bullseye
* 01:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS bullseye
* 01:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 01:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS bullseye
* 00:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 00:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 00:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye


== 2021-03-20 ==
== 2022-06-16 ==
* 00:22 tzatziki: altering emails for STei (WMF) and SGrabarczuk (WMF)
* 23:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 23:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 23:38 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 23:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
* 22:59 mutante: new Wikipedia languages added to DNS:  blk = https://en.wikipedia.org/wiki/Pa%27O_language  {{!}}  pcm = https://en.wikipedia.org/wiki/Nigerian_Pidgin
* 22:37 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:33 volans@cumin2002: START - Cookbook sre.dns.netbox
* 21:18 thcipriani@deploy1002: Finished scap: noop test (duration: 04m 07s)
* 21:14 thcipriani@deploy1002: Started scap: noop test
* 21:10 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:805433{{!}}CommonSettings: clean up and simplify some code]] (duration: 03m 42s)
* 21:06 thcipriani@deploy1002: Synchronized multiversion/MWRealm.php: Config: [[gerrit:806249{{!}}MWRealm.php: remove unused getRealmSpecificFilename() (T171115)]] (duration: 03m 35s)
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 thcipriani@deploy1002: Finished scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]] (duration: 16m 57s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:42 thcipriani@deploy1002: Started scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]]
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 cjming@deploy1002: Synchronized phpcs.xml: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 27s)
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:23 cjming@deploy1002: Synchronized wmf-config/: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 22s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:12 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805179{{!}}Turn off TOC A/B test for pilot wikis (T309683)]] (duration: 03m 37s)
* 19:39 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner2001.codfw.wmnet
* 19:39 aokoth@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:23 aokoth@cumin1001: START - Cookbook sre.dns.netbox
* 19:03 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner2001.codfw.wmnet
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab-runner1001.eqiad.wmnet
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29904 and previous config saved to /var/cache/conftool/dbconfig/20220616-185520-marostegui.json
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:54 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 18:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.eqiad.wmnet
* 18:53 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:50 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:49 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:44 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to all wikis
* 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:42 brennen@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/CheckUser/src/Hooks.php: Backport: [[gerrit:806246{{!}}Only try to create User object if username is not null (T310747)]] (duration: 03m 23s)
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29903 and previous config saved to /var/cache/conftool/dbconfig/20220616-184015-marostegui.json
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29902 and previous config saved to /var/cache/conftool/dbconfig/20220616-182510-marostegui.json
* 18:13 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:11 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29901 and previous config saved to /var/cache/conftool/dbconfig/20220616-181005-marostegui.json
* 18:10 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 17:59 brennen: end of phabricator deploy
* 17:46 brennen: starting phabricator deploy, momentary downtime expected while services restart
* 17:42 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:42 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29900 and previous config saved to /var/cache/conftool/dbconfig/20220616-173738-marostegui.json
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29899 and previous config saved to /var/cache/conftool/dbconfig/20220616-173725-marostegui.json
* 17:31 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:31 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:27 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:26 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29898 and previous config saved to /var/cache/conftool/dbconfig/20220616-172220-marostegui.json
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29897 and previous config saved to /var/cache/conftool/dbconfig/20220616-170715-marostegui.json
* 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29896 and previous config saved to /var/cache/conftool/dbconfig/20220616-165210-marostegui.json
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29895 and previous config saved to /var/cache/conftool/dbconfig/20220616-161844-marostegui.json
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29894 and previous config saved to /var/cache/conftool/dbconfig/20220616-161835-marostegui.json
* 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29893 and previous config saved to /var/cache/conftool/dbconfig/20220616-160330-marostegui.json
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29892 and previous config saved to /var/cache/conftool/dbconfig/20220616-154825-marostegui.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29891 and previous config saved to /var/cache/conftool/dbconfig/20220616-153320-marostegui.json
* 15:31 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 15:30 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 15:30 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 15:29 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 15:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P29890 and previous config saved to /var/cache/conftool/dbconfig/20220616-151434-ladsgroup.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P29889 and previous config saved to /var/cache/conftool/dbconfig/20220616-145931-ladsgroup.json
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29888 and previous config saved to /var/cache/conftool/dbconfig/20220616-145136-marostegui.json
* 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29887 and previous config saved to /var/cache/conftool/dbconfig/20220616-145128-marostegui.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P29886 and previous config saved to /var/cache/conftool/dbconfig/20220616-144427-ladsgroup.json
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29885 and previous config saved to /var/cache/conftool/dbconfig/20220616-143623-marostegui.json
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P29884 and previous config saved to /var/cache/conftool/dbconfig/20220616-142923-ladsgroup.json
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29883 and previous config saved to /var/cache/conftool/dbconfig/20220616-142118-marostegui.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29882 and previous config saved to /var/cache/conftool/dbconfig/20220616-140613-marostegui.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29881 and previous config saved to /var/cache/conftool/dbconfig/20220616-140453-root.json
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 volans@cumin1001: dbctl commit (dc=all): 'Doesn't have new wikiuser', diff saved to https://phabricator.wikimedia.org/P29880 and previous config saved to /var/cache/conftool/dbconfig/20220616-140107-volans.json
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29879 and previous config saved to /var/cache/conftool/dbconfig/20220616-134950-root.json
* 13:45 sukhe: upload bird2_2.0.7-4.1wm1 to apt.wm.o (buster) - [[phab:T310574|T310574]]
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29878 and previous config saved to /var/cache/conftool/dbconfig/20220616-133446-root.json
* 13:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1089.eqiad.wmnet
* 13:22 jayme@cumin1001: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
* 13:21 jayme@cumin1001: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29877 and previous config saved to /var/cache/conftool/dbconfig/20220616-131942-root.json
* 13:10 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
* 13:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29876 and previous config saved to /var/cache/conftool/dbconfig/20220616-130438-root.json
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 13:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29875 and previous config saved to /var/cache/conftool/dbconfig/20220616-123357-marostegui.json
* 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1008.eqiad.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 for schema change', diff saved to https://phabricator.wikimedia.org/P29874 and previous config saved to /var/cache/conftool/dbconfig/20220616-115924-root.json
* 11:53 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1008.eqiad.wmnet
* 11:53 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1007.eqiad.wmnet
* 11:45 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1007.eqiad.wmnet
* 11:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
* 11:38 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
* 11:35 godog: trim swift logs older than 25d from centrallog hosts - [[phab:T309171|T309171]]
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1005.eqiad.wmnet
* 11:27 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1005.eqiad.wmnet
* 11:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
* 11:19 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29873 and previous config saved to /var/cache/conftool/dbconfig/20220616-111632-marostegui.json
* 11:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 11:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
* 11:09 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 11:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 11:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29871 and previous config saved to /var/cache/conftool/dbconfig/20220616-110127-marostegui.json
* 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
* 10:54 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29870 and previous config saved to /var/cache/conftool/dbconfig/20220616-104622-marostegui.json
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
* 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
* 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboots
* 10:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboots
* 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1089.eqiad.wmnet
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
* 10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1089.eqiad.wmnet
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29869 and previous config saved to /var/cache/conftool/dbconfig/20220616-103117-marostegui.json
* 10:28 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:28 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster
* 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster
* 10:02 elukey: ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3]
* 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:32 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS buster
* 09:21 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS buster
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29868 and previous config saved to /var/cache/conftool/dbconfig/20220616-091131-marostegui.json
* 09:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:02 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6002.drmrs.wmnet
* 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 08:45 moritzm: failover ganeti master in drmrs/2 to ganeti6004
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805370{{!}}testwiki: Enable SectionTranslation for 11 Wikipedias (T309384 T310116)]] (duration: 03m 41s)
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 joal: Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure


== 2021-03-19 ==
== 2022-06-15 ==
* 21:11 mutante: scandium - stop apache and rerun puppet which fails after reimaging because it tries to run an nginx on port 80 which is already used by apache [[phab:T268248|T268248]]
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29867 and previous config saved to /var/cache/conftool/dbconfig/20220615-224845-marostegui.json
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29866 and previous config saved to /var/cache/conftool/dbconfig/20220615-223339-marostegui.json
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
* 22:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS buster
* 20:15 mutante: scandium - reimaging with buster
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29865 and previous config saved to /var/cache/conftool/dbconfig/20220615-221834-marostegui.json
* 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS buster
* 20:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2245.codfw.wmnet
* 22:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 19:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2245.codfw.wmnet
* 22:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2244.codfw.wmnet
* 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 19:53 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1002.wikimedia.org
* 22:12 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 19:50 mutante: testreduce1001 - confirmed MariaDB @@datadir is /srv/data/mysql and deleting /var/lib/mysql ([[phab:T277580|T277580]])
* 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 19:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2244.codfw.wmnet
* 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29864 and previous config saved to /var/cache/conftool/dbconfig/20220615-220329-marostegui.json
* 19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2245.codfw.wmnet
* 22:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 19:39 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1002.wikimedia.org
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2244.codfw.wmnet
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet,service=canary
* 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet,service=canary
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29863 and previous config saved to /var/cache/conftool/dbconfig/20220615-213241-marostegui.json
* 19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2252.codfw.wmnet,service=canary
* 21:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2251.codfw.wmnet,service=canary
* 21:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 19:24 mutante: deploy2002 - re-enabled puppet, reverted patch of scap-sync-master
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29862 and previous config saved to /var/cache/conftool/dbconfig/20220615-213233-marostegui.json
* 18:46 mutante: deploy2002 - disable puppet, copy modified version of scap-master-sync over it that does not --exclude="**/cache/l10n/*.cdb"  (for [[phab:T275826|T275826]])
* 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29861 and previous config saved to /var/cache/conftool/dbconfig/20220615-211728-marostegui.json
* 16:01 effie: upgrade memcached on mc-gp200*
* 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29860 and previous config saved to /var/cache/conftool/dbconfig/20220615-210223-marostegui.json
* 12:36 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 20:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29859 and previous config saved to /var/cache/conftool/dbconfig/20220615-204717-marostegui.json
* 12:34 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:10 effie: upgrade memcached on mc1026,mc2026
* 20:08 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804014{{!}}Remove unused setting wgQuickSurveysUseVue (T285890)]] (duration: 03m 38s)
* 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:36 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:36 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 19:50 hashar@deploy1002: Finished deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]] (duration: 00m 10s)
* 11:30 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 19:50 hashar@deploy1002: Started deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]]
* 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29858 and previous config saved to /var/cache/conftool/dbconfig/20220615-194703-marostegui.json
* 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 19:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 19:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29857 and previous config saved to /var/cache/conftool/dbconfig/20220615-194655-marostegui.json
* 11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29856 and previous config saved to /var/cache/conftool/dbconfig/20220615-193150-marostegui.json
* 11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:31 hashar: wikibugs IRC bot has been restarted by valhallasw \o/ # [[phab:T310734|T310734]]
* 11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29855 and previous config saved to /var/cache/conftool/dbconfig/20220615-191645-marostegui.json
* 11:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29854 and previous config saved to /var/cache/conftool/dbconfig/20220615-190140-marostegui.json
* 11:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
* 18:42 hashar: wikibugs (irc bot for Phabricator/Gerrit) is no more working and would need a restart [[phab:T310734|T310734]]
* 10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29853 and previous config saved to /var/cache/conftool/dbconfig/20220615-182140-marostegui.json
* 10:42 moritzm: installing dbmonitor1002 [[phab:T224589|T224589]]
* 18:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:11 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.16 refs [[phab:T308069|T308069]] (duration: 03m 43s)
* 10:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:05 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:40 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 09:36 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:22 elukey: upload alluxio 2.4.1 to thirdparty/bigtop15 on stretch/buster-wikimedia
* 17:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1015.eqiad.wmnet with OS buster
* 07:16 ryankemper: [[phab:T275885|T275885]] `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (change hadn't been merged when I ran the agent earlier)
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 04:04 eileen: civicrm revision changed from {{Gerrit|99bf1c9210}} to {{Gerrit|39d24e8b0a}}, config revision is {{Gerrit|26b02db7ba}}
* 17:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 03:27 ryankemper: [wdqs] `ryankemper@wdqs1013:~$ sudo systemctl restart wdqs-blazegraph`
* 17:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 03:26 ryankemper: [[phab:T275885|T275885]] `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo run-puppet-agent'`
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 02:43 ryankemper: [[phab:T275885|T275885]] Revoking current `relforge` TLS cert in advance of generation of new cert: `ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean relforge.svc.eqiad.wmnet`
* 17:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 00:51 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: [[phab:T277772|T277772]] (duration: 00m 58s)
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 00:45 mutante: testreduce1001 - stop mysql; rsyncing /var/lib/mysql to /srv/data/mysql ([[phab:T277580|T277580]])
* 17:52 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1014.eqiad.wmnet with OS buster
* 17:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 17:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29851 and previous config saved to /var/cache/conftool/dbconfig/20220615-172738-marostegui.json
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29849 and previous config saved to /var/cache/conftool/dbconfig/20220615-171233-marostegui.json
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29848 and previous config saved to /var/cache/conftool/dbconfig/20220615-165727-marostegui.json
* 16:54 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to group0
* 16:44 jynus: reestarting replication for m3 on db1117, not db2078
* 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29847 and previous config saved to /var/cache/conftool/dbconfig/20220615-164222-marostegui.json
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:29 brennen: phabricator upgrade finished
* 16:27 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Id8cdb8aef70f6672}} (duration: 03m 41s)
* 16:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host backup1009.eqiad.wmnet
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host backup1009.eqiad.wmnet
* 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29845 and previous config saved to /var/cache/conftool/dbconfig/20220615-160838-marostegui.json
* 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29844 and previous config saved to /var/cache/conftool/dbconfig/20220615-160830-marostegui.json
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS buster
* 15:56 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:53 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P29843 and previous config saved to /var/cache/conftool/dbconfig/20220615-155325-marostegui.json
* 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 15:40 mutante: phabricator upgrade in progress
* 15:39 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:39 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20220615-153820-marostegui.json
* 15:35 brennen: starting phabricator deploy, momentary downtime expected while Apache restarts and migrations run
* 15:34 jynus: stopping replication for m3 on db1117, db2078
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29841 and previous config saved to /var/cache/conftool/dbconfig/20220615-152315-marostegui.json
* 15:20 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ms-be1059.eqiad.wmnet with OS bullseye
* 15:20 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:20 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:06 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:05 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:05 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:03 mutante: phabricator maintenance about to start
* 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 15:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 14:59 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
* 14:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:52 jbond@cumin1001: END (ERROR) - Cookbook sre.pdus.uptime (exit_code=97)
* 14:51 jbond@cumin1001: START - Cookbook sre.pdus.uptime
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29840 and previous config saved to /var/cache/conftool/dbconfig/20220615-145028-marostegui.json
* 14:50 urandom: ALTER-ing replication for codfw (Cassandra) expansion -- [[phab:T307641|T307641]]
* 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29839 and previous config saved to /var/cache/conftool/dbconfig/20220615-145020-marostegui.json
* 14:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:49 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:46 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:46 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29838 and previous config saved to /var/cache/conftool/dbconfig/20220615-143515-marostegui.json
* 14:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:30 hnowlan@deploy1002: Synchronized private/PrivateSettings.php: [[phab:T308670|T308670]] credentials to access the similar-users service (duration: 03m 32s)
* 14:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:22 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:21 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29836 and previous config saved to /var/cache/conftool/dbconfig/20220615-142010-marostegui.json
* 14:19 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
* 14:16 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS buster
* 14:10 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jnuche@deploy1002: Installation of scap version "4.9.4" completed for 558 hosts
* 14:08 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
* 14:08 jnuche@deploy1002: Installing scap version "4.9.4" for 558 hosts
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29834 and previous config saved to /var/cache/conftool/dbconfig/20220615-140505-marostegui.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:01 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:01 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:58 awight: EU afternoon backport window complete.
* 13:57 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/Translate/src/PageTranslation/DeleteTranslatableBundleSpecialPage.php: Backport: [[gerrit:805749{{!}}Fix deletion of translation pages outside of NS_MAIN namespace (T310440)]] (duration: 00m 32s)
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29833 and previous config saved to /var/cache/conftool/dbconfig/20220615-135508-root.json
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29832 and previous config saved to /var/cache/conftool/dbconfig/20220615-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29831 and previous config saved to /var/cache/conftool/dbconfig/20220615-135458-root.json
* 13:54 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:53 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:51 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:49 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29830 and previous config saved to /var/cache/conftool/dbconfig/20220615-134004-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29829 and previous config saved to /var/cache/conftool/dbconfig/20220615-133958-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29828 and previous config saved to /var/cache/conftool/dbconfig/20220615-133954-root.json
* 13:38 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: [[gerrit:805745{{!}}Restore internal mechanism to use either back or close button (T310602)]] (duration: 00m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29827 and previous config saved to /var/cache/conftool/dbconfig/20220615-133334-marostegui.json
* 13:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29826 and previous config saved to /var/cache/conftool/dbconfig/20220615-133326-marostegui.json
* 13:31 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 01m 08s)
* 13:30 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 02m 06s)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29825 and previous config saved to /var/cache/conftool/dbconfig/20220615-132500-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29824 and previous config saved to /var/cache/conftool/dbconfig/20220615-132454-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29823 and previous config saved to /var/cache/conftool/dbconfig/20220615-132450-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29822 and previous config saved to /var/cache/conftool/dbconfig/20220615-131820-marostegui.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29821 and previous config saved to /var/cache/conftool/dbconfig/20220615-130956-root.json
* 13:09 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 03s)
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29820 and previous config saved to /var/cache/conftool/dbconfig/20220615-130951-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29819 and previous config saved to /var/cache/conftool/dbconfig/20220615-130946-root.json
* 13:08 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:04 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 43s)
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29818 and previous config saved to /var/cache/conftool/dbconfig/20220615-130315-marostegui.json
* 13:02 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 12:56 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 58s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:55 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 05s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29817 and previous config saved to /var/cache/conftool/dbconfig/20220615-125452-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29816 and previous config saved to /var/cache/conftool/dbconfig/20220615-125447-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29815 and previous config saved to /var/cache/conftool/dbconfig/20220615-125442-root.json
* 12:51 jbond@deploy1002: Finished deploy [netbox/deploy@7bbf659]: log (duration: 03m 12s)
* 12:48 jbond@deploy1002: Started deploy [netbox/deploy@7bbf659]: log
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29813 and previous config saved to /var/cache/conftool/dbconfig/20220615-124810-marostegui.json
* 12:42 moritzm: failover ganeti master in eqsin to ganeti5001
* 12:42 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:42 volans@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29812 and previous config saved to /var/cache/conftool/dbconfig/20220615-123949-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29811 and previous config saved to /var/cache/conftool/dbconfig/20220615-123943-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29810 and previous config saved to /var/cache/conftool/dbconfig/20220615-123938-root.json
* 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
* 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
* 12:25 kart_: Updated cxserver to 2022-06-15-074244-production ([[phab:T309266|T309266]], [[phab:T310116|T310116]], [[phab:T309384|T309384]], [[phab:T306963|T306963]])
* 12:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 12:23 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 es1033 es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29808 and previous config saved to /var/cache/conftool/dbconfig/20220615-122123-root.json
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 12:19 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29807 and previous config saved to /var/cache/conftool/dbconfig/20220615-121620-marostegui.json
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29806 and previous config saved to /var/cache/conftool/dbconfig/20220615-121440-marostegui.json
* 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
* 12:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29805 and previous config saved to /var/cache/conftool/dbconfig/20220615-115935-marostegui.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29804 and previous config saved to /var/cache/conftool/dbconfig/20220615-115452-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29803 and previous config saved to /var/cache/conftool/dbconfig/20220615-115135-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29802 and previous config saved to /var/cache/conftool/dbconfig/20220615-115127-root.json
* 11:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29801 and previous config saved to /var/cache/conftool/dbconfig/20220615-114950-marostegui.json
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29800 and previous config saved to /var/cache/conftool/dbconfig/20220615-114430-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29799 and previous config saved to /var/cache/conftool/dbconfig/20220615-113948-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29798 and previous config saved to /var/cache/conftool/dbconfig/20220615-113631-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29797 and previous config saved to /var/cache/conftool/dbconfig/20220615-113623-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29796 and previous config saved to /var/cache/conftool/dbconfig/20220615-113445-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29795 and previous config saved to /var/cache/conftool/dbconfig/20220615-112924-marostegui.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29794 and previous config saved to /var/cache/conftool/dbconfig/20220615-112444-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29793 and previous config saved to /var/cache/conftool/dbconfig/20220615-112127-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29792 and previous config saved to /var/cache/conftool/dbconfig/20220615-112119-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29791 and previous config saved to /var/cache/conftool/dbconfig/20220615-111940-marostegui.json
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29790 and previous config saved to /var/cache/conftool/dbconfig/20220615-110940-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29789 and previous config saved to /var/cache/conftool/dbconfig/20220615-110623-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29788 and previous config saved to /var/cache/conftool/dbconfig/20220615-110616-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29787 and previous config saved to /var/cache/conftool/dbconfig/20220615-110435-marostegui.json
* 10:55 marostegui: dbmaint es3@eqiad [[phab:T310485|T310485]]
* 10:55 marostegui: dbmaint es2@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui: dbmaint es1@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29786 and previous config saved to /var/cache/conftool/dbconfig/20220615-105437-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29784 and previous config saved to /var/cache/conftool/dbconfig/20220615-105119-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29783 and previous config saved to /var/cache/conftool/dbconfig/20220615-105112-root.json
* 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29782 and previous config saved to /var/cache/conftool/dbconfig/20220615-103933-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29781 and previous config saved to /var/cache/conftool/dbconfig/20220615-103615-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29780 and previous config saved to /var/cache/conftool/dbconfig/20220615-103608-root.json
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29779 and previous config saved to /var/cache/conftool/dbconfig/20220615-103101-marostegui.json
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29778 and previous config saved to /var/cache/conftool/dbconfig/20220615-103048-marostegui.json
* 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029 es1030 es1028 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29777 and previous config saved to /var/cache/conftool/dbconfig/20220615-102929-root.json
* 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29776 and previous config saved to /var/cache/conftool/dbconfig/20220615-101543-marostegui.json
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29775 and previous config saved to /var/cache/conftool/dbconfig/20220615-100235-marostegui.json
* 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P29774 and previous config saved to /var/cache/conftool/dbconfig/20220615-100037-marostegui.json
* 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4001.ulsfo.wmnet
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29773 and previous config saved to /var/cache/conftool/dbconfig/20220615-094532-marostegui.json
* 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
* 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
* 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29772 and previous config saved to /var/cache/conftool/dbconfig/20220615-092706-marostegui.json
* 09:20 marostegui: Reboot sanitarium hosts (db1154, db1155) wiki replicas will have lag
* 09:14 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1059.eqiad.wmnet with OS bullseye
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29771 and previous config saved to /var/cache/conftool/dbconfig/20220615-091257-marostegui.json
* 09:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29770 and previous config saved to /var/cache/conftool/dbconfig/20220615-091249-marostegui.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29769 and previous config saved to /var/cache/conftool/dbconfig/20220615-091201-marostegui.json
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29768 and previous config saved to /var/cache/conftool/dbconfig/20220615-085744-marostegui.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P29767 and previous config saved to /var/cache/conftool/dbconfig/20220615-085656-marostegui.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P29766 and previous config saved to /var/cache/conftool/dbconfig/20220615-084239-marostegui.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29765 and previous config saved to /var/cache/conftool/dbconfig/20220615-084151-marostegui.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29764 and previous config saved to /var/cache/conftool/dbconfig/20220615-084046-marostegui.json
* 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29763 and previous config saved to /var/cache/conftool/dbconfig/20220615-083554-root.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29762 and previous config saved to /var/cache/conftool/dbconfig/20220615-082734-marostegui.json
* 08:23 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:22 jnuche@deploy1002: Installation of scap version "4.9.3" completed for 557 hosts
* 08:22 jnuche@deploy1002: Installing scap version "4.9.3" for 557 hosts
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29761 and previous config saved to /var/cache/conftool/dbconfig/20220615-082050-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29760 and previous config saved to /var/cache/conftool/dbconfig/20220615-081744-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29759 and previous config saved to /var/cache/conftool/dbconfig/20220615-080546-root.json
* 08:03 XioNoX: re-enable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29758 and previous config saved to /var/cache/conftool/dbconfig/20220615-080240-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29757 and previous config saved to /var/cache/conftool/dbconfig/20220615-075042-root.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29756 and previous config saved to /var/cache/conftool/dbconfig/20220615-075024-marostegui.json
* 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29755 and previous config saved to /var/cache/conftool/dbconfig/20220615-074736-root.json
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29754 and previous config saved to /var/cache/conftool/dbconfig/20220615-073538-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29753 and previous config saved to /var/cache/conftool/dbconfig/20220615-073232-root.json
* 07:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29752 and previous config saved to /var/cache/conftool/dbconfig/20220615-072352-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P29751 and previous config saved to /var/cache/conftool/dbconfig/20220615-072034-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29750 and previous config saved to /var/cache/conftool/dbconfig/20220615-071728-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29749 and previous config saved to /var/cache/conftool/dbconfig/20220615-070847-marostegui.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29748 and previous config saved to /var/cache/conftool/dbconfig/20220615-065342-marostegui.json
* 06:52 XioNoX: disable BGP to Telia in eqsin for optic replacement - [[phab:T300485|T300485]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29747 and previous config saved to /var/cache/conftool/dbconfig/20220615-063837-marostegui.json
* 06:02 marostegui: Reboot db[2071-2078] [[phab:T310485|T310485]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29746 and previous config saved to /var/cache/conftool/dbconfig/20220615-060153-marostegui.json
* 06:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29745 and previous config saved to /var/cache/conftool/dbconfig/20220615-054252-marostegui.json
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
* 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1173.eqiad.wmnet with OS bullseye
* 05:17 marostegui: dbmaint es5@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es4@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es3@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es2@codfw [[phab:T310485|T310485]]
* 05:17 marostegui: dbmaint es1@codfw [[phab:T310485|T310485]]
* 05:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage
* 05:03 marostegui: Reboot dbproxy1016 and dbproxy1021 [[phab:T310484|T310484]]
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 02:25 tstarling@deploy1002: Synchronized php-1.39.0-wmf.16/includes/cache/MessageCache.php: (no justification provided) (duration: 03m 36s)
* 02:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 02:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 02:17 tstarling@deploy1002: Synchronized php-1.39.0-wmf.15/includes/cache/MessageCache.php: [[phab:T310532|T310532]] (duration: 03m 29s)
* 02:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-03-18 ==
== 2022-06-14 ==
* 23:56 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Don't define a default icon ([[phab:T274199|T274199]]) (duration: 00m 57s)
* 23:52 mutante: gitlab-runner1001/1002 - clean revert not possible, icinga alerting about failed buildkitd service, manually deleting systemd unit and trying to clean up [[phab:T308271|T308271]]
* 23:38 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: [[gerrit:673115{{!}}ActorStore::getActorById - fall back to master. (T277795)]] (duration: 00m 57s)
* 23:49 mutante: gitlab-runner1002 - systemctl restart docker; run-puppet-agent ; systemctl start buildkitd  - fails though [[phab:T308271|T308271]]
* 23:35 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: [[gerrit:673115{{!}}ActorStore::getActorById - fall back to master. (T277795)]] (duration: 00m 58s)
* 23:39 mutante: gitlab-runner1001 - systemctl start buildkitd
* 23:25 dduvall@deploy1002: Synchronized .pipeline: config: [[gerrit:673375{{!}}Use build environment HTTP proxy for APT sources (T277109)]] (duration: 01m 02s)
* 23:32 mutante: gitlab-runner1001 - restarting docker
* 23:06 brennen: train status: 1.36.0-wmf.35 ([[phab:T274939|T274939]]) stable on all wikis after deploy of hotfix for [[phab:T277795|T277795]]
* 23:08 mutante: disabling puppet in gitlab-runners (via cumin /disable-puppet) before deploying gerrit:791655 to provide gitlab-runners with buildkit and new docker network - [[phab:T308271|T308271]]
* 22:53 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/specials/SpecialContributions.php: Backport: [[gerrit:673115{{!}}ActorStore::getActorById - fall back to master. (T277795)]] (duration: 01m 07s)
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:30 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:29 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:25 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:37 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: (no justification provided) (duration: 01m 05s)
* 22:15 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|e3fe6c04c95717f0f914bbfa366f5f827f392b6b}}: phpcs: fix more SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 39s)
* 19:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.35
* 22:05 urbanecm@deploy1002: Synchronized w/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 18s)
* 18:28 legoktm: re-enabled puppet on registry*
* 22:02 urbanecm@deploy1002: Synchronized src/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 32s)
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|44eddcc}}: hrwiki: Deploy Growth features to newcomers ([[phab:T275684|T275684]]) (duration: 01m 08s)
* 22:00 mutante: wtp1026 - manually running '/usr/bin/sudo -u root -- /usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807'
* 18:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|179d9e5}}: mswiki: Enable Growth features in stealth mode ([[phab:T277562|T277562]]; 2/2) (duration: 01m 08s)
* 21:58 urbanecm@deploy1002: Synchronized rpc/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 31s)
* 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|179d9e5}}: mswiki: Enable Growth features in stealth mode ([[phab:T277562|T277562]]; 1/2) (duration: 01m 11s)
* 21:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:58 legoktm: disabled puppet on registry* for rolling out https://gerrit.wikimedia.org/r/672537
* 21:54 urbanecm@deploy1002: Synchronized multiversion/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 29s)
* 17:50 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|55aa6cb}}: tewiki: Enable Growth features in stealth mode ([[phab:T277491|T277491]]; 2/2) (duration: 01m 08s)
* 21:54 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2242.codfw.wmnet
* 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|55aa6cb}}: tewiki: Enable Growth features in stealth mode ([[phab:T277491|T277491]]; 1/2) (duration: 01m 10s)
* 21:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|04342e9bb0765a6a58ad78bd7eaa380d4167f0c1}}: simplewiki: Enable Growth team features in stealth mode ([[phab:T277550|T277550]]) (duration: 01m 09s)
* 21:51 urbanecm@deploy1002: Synchronized docroot/: {{Gerrit|ca3b94f2d9bc755d92839e5e69072615ea9008df}}: phpcs: start to fix SpaceBeforeSingleLineComment.NewLineComment ([[phab:T171115|T171115]]) (duration: 03m 38s)
* 17:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|04342e9bb0765a6a58ad78bd7eaa380d4167f0c1}}: simplewiki: Enable Growth team features in stealth mode ([[phab:T277550|T277550]]) (duration: 01m 10s)
* 21:49 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
* 17:40 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2242.codfw.wmnet
* 21:47 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
* 17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2241.codfw.wmnet
* 21:40 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
* 17:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2241.codfw.wmnet
* 21:38 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
* 17:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2240.codfw.wmnet
* 21:32 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
* 16:54 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2240.codfw.wmnet
* 21:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 16:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2239.codfw.wmnet
* 21:23 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 16:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2239.codfw.wmnet
* 21:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2242.codfw.wmnet
* 21:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2241.codfw.wmnet
* 21:10 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2240.codfw.wmnet
* 21:03 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2239.codfw.wmnet
* 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:33 shdubsh: clean up dead letter queue and restart all logstashes
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:37 dcausse: repooling wdqs1005
* 20:41 urbanecm@deploy1002: Synchronized docroot/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 41s)
* 14:29 hashar: Restarting CI Jenkins for plugin upgrade
* 20:37 urbanecm@deploy1002: Synchronized w/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 15s)
* 13:49 elukey: reboot analytics1066
* 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:23 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/Wikibase/repo: [[gerrit:673108{{!}}languageLabelDescriptionAliases: use getLanguageNameByCode]] ([[phab:T275611|T275611]] [[phab:T277722|T277722]]) (duration: 01m 14s)
* 20:34 urbanecm@deploy1002: Synchronized multiversion/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 28s)
* 12:58 jbond42: upload cas_6.3.2 to apt buster-wikimedia
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:37 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:34 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 20:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 11:25 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 20:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 11:24 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|896c9f019b17d1ad3a1589d377158ca2fb91ebaa}}: flaggedrevs: Disable multiple dimensions in hewikisource (duration: 01m 09s)
* 20:31 urbanecm@deploy1002: Synchronized wmf-config/: phpcs cleanups ([[phab:T171115|T171115]]; no-op for production) (duration: 03m 38s)
* 11:20 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/GrowthExperiments/includes/HomepageHooks.php: {{Gerrit|3b2aa1aa28e9d204f32ae937a84ec211137cbb2e}}: Remove variant C from list of valid variants ([[phab:T277727|T277727]]) (duration: 01m 09s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:16 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1021.eqiad.wmnet with OS buster
* 11:14 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 20:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1020.eqiad.wmnet with OS buster
* 11:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1018.eqiad.wmnet with OS buster
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0005676e704cad907655a4a0bca7bd2164714b1c}}: GrowthExperiments: set $wgGEHomepageNewAccountVariants to D only ([[phab:T277727|T277727]]) (duration: 01m 10s)
* 20:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1017.eqiad.wmnet with OS buster
* 11:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: NOOP: {{Gerrit|e7f5eac}}: Enable CentralAuth IRC feed in beta cluster ([[phab:T277432|T277432]]) (duration: 01m 12s)
* 19:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 09:13 _joe_: hard reboot of snapshot1005
* 19:40 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 09:04 _joe_: attempted reboot of snapshot1005, read-only filesystem and probably disks are broken beyond repair
* 19:40 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
* 08:27 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - [[phab:T272836|T272836]]
* 19:36 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 19:36 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
* 19:32 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14946 and previous config saved to /var/cache/conftool/dbconfig/20210318-080258-root.json
* 19:32 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
* 08:02 akosiaris: reimage ml-serve1004 to debug a docker volume_group issue
* 19:16 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14945 and previous config saved to /var/cache/conftool/dbconfig/20210318-074754-root.json
* 19:10 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14944 and previous config saved to /var/cache/conftool/dbconfig/20210318-073250-root.json
* 18:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 07:20 dcausse: depooling & restarting blazegraph on wdqs1005
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 07:19 marostegui: Deploy schema change on s4 codfw master, lag will appear - [[phab:T276150|T276150]] [[phab:T276156|T276156]]
* 18:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14943 and previous config saved to /var/cache/conftool/dbconfig/20210318-071747-root.json
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 18:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 18:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1161 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14942 and previous config saved to /var/cache/conftool/dbconfig/20210318-063241-marostegui.json
* 18:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2120', diff saved to https://phabricator.wikimedia.org/P14941 and previous config saved to /var/cache/conftool/dbconfig/20210318-062201-marostegui.json
* 18:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P14940 and previous config saved to /var/cache/conftool/dbconfig/20210318-060445-marostegui.json
* 18:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 03:46 andrewbogott: restarting slapd on seaborgium, serpens, and r-o ldap replicas (we're getting irregular connection failures)
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:05 eileen: tools revision changed from {{Gerrit|b7b4060c30}} to {{Gerrit|ef54260b0d}}
* 18:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:15 ayounsi@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=imagescaler-ro,name=codfw
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
* 17:57 brennen@deploy1002: Pruned MediaWiki: 1.39.0-wmf.14 (duration: 01m 53s)
* 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:55 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.16 (duration: 32m 52s)
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:25 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
* 17:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:22 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.16
* 17:13 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): train is blocked - will sync to testwikis and hold there for resolution of [[phab:T310532|T310532]]
* 16:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2053.codfw.wmnet with OS bullseye
* 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
* 16:12 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
* 16:12 jnuche@deploy1002: Installation of scap version "4.9.2" completed for 557 hosts
* 16:11 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 16:05 jnuche@deploy1002: Installing scap version "4.9.2" for 557 hosts
* 16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 15:58 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2053.codfw.wmnet with reason: host reimage
* 15:34 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2053.codfw.wmnet with OS bullseye
* 15:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host elastic2053.codfw.wmnet
* 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host elastic2053.codfw.wmnet
* 15:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
* 14:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:53 moritzm: failover ganeti master in ulsfo to ganeti4003
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:49 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|596058b5e4d906d40e620fe5b01f37c484f5a8c1}}: Add new throttle rule + remove expired one ([[phab:T310625|T310625]]) (duration: 03m 38s)
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:33 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
* 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: reboots
* 14:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: reboots
* 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4003.ulsfo.wmnet
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4003.ulsfo.wmnet
* 14:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2012.codfw.wmnet with OS buster
* 14:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2010.codfw.wmnet with OS buster
* 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2009.codfw.wmnet with OS buster
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 14:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2011.codfw.wmnet with OS buster
* 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2008.codfw.wmnet with OS buster
* 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4002.ulsfo.wmnet
* 14:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2007.codfw.wmnet with OS buster
* 14:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2006.codfw.wmnet with OS buster
* 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4002.ulsfo.wmnet
* 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 14:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2005.codfw.wmnet with OS buster
* 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29741 and previous config saved to /var/cache/conftool/dbconfig/20220614-132654-marostegui.json
* 13:13 urbanecm: UTC afternoon B&C window done
* 13:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1692de09bf04c724cf416679405d4b6485550d40}}: Disable DiscussionTools visualenhancements feature in production (duration: 03m 25s)
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29740 and previous config saved to /var/cache/conftool/dbconfig/20220614-131149-marostegui.json
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7f2dc7296f0c25d00e45651c50c3e45733cc63b3}}: Make new topic tool available as opt-out almost everywhere (phrase 4; [[phab:T310392|T310392]]) (duration: 03m 45s)
* 13:06 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
* 13:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
* 13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
* 13:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P29739 and previous config saved to /var/cache/conftool/dbconfig/20220614-125644-marostegui.json
* 12:56 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
* 12:47 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS buster
* 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS buster
* 12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS buster
* 12:42 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2009.codfw.wmnet with OS buster
* 12:41 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2008.codfw.wmnet with OS buster
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29738 and previous config saved to /var/cache/conftool/dbconfig/20220614-124139-marostegui.json
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2007.codfw.wmnet with OS buster
* 12:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2004.codfw.wmnet with OS buster
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2006.codfw.wmnet with OS buster
* 12:38 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS buster
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1157 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29737 and previous config saved to /var/cache/conftool/dbconfig/20220614-120323-marostegui.json
* 12:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 12:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29735 and previous config saved to /var/cache/conftool/dbconfig/20220614-115020-marostegui.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P29734 and previous config saved to /var/cache/conftool/dbconfig/20220614-113515-marostegui.json
* 11:10 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1173.eqiad.wmnet with OS bullseye
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29732 and previous config saved to /var/cache/conftool/dbconfig/20220614-110945-root.json
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29731 and previous config saved to /var/cache/conftool/dbconfig/20220614-110504-marostegui.json
* 11:02 moritzm: rebalancing ganeti cluster in esams [[phab:T308238|T308238]]
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
* 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29730 and previous config saved to /var/cache/conftool/dbconfig/20220614-105441-root.json
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 10:44 joal@deploy1002: Finished deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency (duration: 00m 09s)
* 10:44 joal@deploy1002: Started deploy [airflow-dags/analytics@24d8d72]: Upgrade jobs to spark3 and add consistency
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29729 and previous config saved to /var/cache/conftool/dbconfig/20220614-104021-marostegui.json
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
* 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29728 and previous config saved to /var/cache/conftool/dbconfig/20220614-103937-root.json
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:30 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3001.esams.wmnet to ganeti01.svc.esams.wmnet
* 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
* 10:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After migrating to 10.6', diff saved to https://phabricator.wikimedia.org/P29727 and previous config saved to /var/cache/conftool/dbconfig/20220614-102433-root.json
* 10:22 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
* 10:22 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye
* 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
* 10:19 marostegui: dbmaint s6@eqiad [[phab:T60674|T60674]]
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
* 10:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after