You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ryankemper: Depooled `wdqs1006` to catch up on lag)
imported>Stashbot
(mutante: restbase-dev1006 has manually installed packages (wrk, maybe others))
(449 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2021-02-13 ==
== 2022-06-23 ==
* 03:23 ryankemper: Depooled `wdqs1006` to catch up on lag
* 21:23 mutante: restbase-dev1006 has manually installed packages (wrk, maybe others)
* 03:23 ryankemper: Restarted blazegraph on `wdqs1006`
* 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:30 crusnov@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mwdebug1002.eqiad.wmnet
* 21:22 brennen: end of utc late backport & config window
* 01:00 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
* 21:21 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808055{{!}}[cleanup] Drop non-existent feature flags]] (duration: 03m 33s)
* 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host mwdebug1002.eqiad.wmnet
* 21:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:49 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
* 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:38 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host mwdebug1002.eqiad.wmnet
* 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1281.eqiad.wmnet
* 21:13 thcipriani@deploy1002: Finished scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]] (duration: 17m 29s)
* 00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1282.eqiad.wmnet
* 21:01 inflatador: looking in to wdqs1006 alert ^^
* 00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1283.eqiad.wmnet
* 20:56 thcipriani@deploy1002: Started scap: Config: [[gerrit:808067{{!}}Change default skin on next set of pilot wikis to Vector (2022) (T307903)]]
* 00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1284.eqiad.wmnet
* 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 00:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 00:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host mwdebug1002.eqiad.wmnet
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 00:26 mutante: ganeti - attempting to recreate VM mwdebug1002 with cookbook that wsa previously deleted manually ([[phab:T274689|T274689]] [[phab:T274023|T274023]])
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 00:25 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
* 20:49 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:808064{{!}}Enable DiscussionTools topicsubscription, autotopicsub on testwiki (T310808)]] (duration: 03m 18s)
* 00:08 mutante: ganeti1011 - manually deleting VM mwdebug1002 - [[phab:T274689|T274689]] [[phab:T274023|T274023]]
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dse-k8s-ctrl1001.eqiad.wmnet
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:48 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:43 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806847{{!}}ukwikibooks: Add NS102 (Рецепт) to wgContentNamespaces (T310940)]] (duration: 03m 41s)
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:43 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl1001.eqiad.wmnet on all recursors
* 20:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:30 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 20:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl1001.eqiad.wmnet
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:15 mutante: cumin -b 15 -p 95 'mw1*' 'run-puppet-agent -q --failed-only'
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:11 mutante: cumin -b 15 -p 95 'mw2*' 'run-puppet-agent -q --failed-only'
* 20:09 mutante: cumin -b 15 -p 95 'parse*' 'run-puppet-agent -q --failed-only'
* 20:07 mutante: cumin -b 15 -p 95 'wtp*' 'run-puppet-agent -q --failed-only'
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:39 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:34 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:24 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 19:21 ejegg: fundraising python tools updated from {{Gerrit|40d376d4}} to {{Gerrit|acf89fb2}}
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:38 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:20 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
* 18:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 18:01 brennen: train 1.39.0-wmf.17 ([[phab:T308070|T308070]]): no current blockers - rolling to all wikis
* 18:01 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1016.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:53 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:32 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:32 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:00 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:17 hashar: Upgrading CI Jenkins # [[phab:T311174|T311174]]
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.17/extensions/WikibaseCirrusSearch/src/Hooks.php: Backport: [[gerrit:807902{{!}}Do not re-use "wikibase_config" for registering the language selector... (T307869)]] (duration: 03m 22s)
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30042 and previous config saved to /var/cache/conftool/dbconfig/20220623-150954-root.json
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30041 and previous config saved to /var/cache/conftool/dbconfig/20220623-150951-root.json
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30040 and previous config saved to /var/cache/conftool/dbconfig/20220623-150422-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30039 and previous config saved to /var/cache/conftool/dbconfig/20220623-145450-root.json
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30038 and previous config saved to /var/cache/conftool/dbconfig/20220623-145448-root.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30037 and previous config saved to /var/cache/conftool/dbconfig/20220623-144918-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30036 and previous config saved to /var/cache/conftool/dbconfig/20220623-143946-root.json
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30035 and previous config saved to /var/cache/conftool/dbconfig/20220623-143944-root.json
* 14:34 papaul: on going PDU maintenance in rack A3 codfw
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30034 and previous config saved to /var/cache/conftool/dbconfig/20220623-143414-root.json
* 14:31 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:30 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30033 and previous config saved to /var/cache/conftool/dbconfig/20220623-142443-root.json
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30032 and previous config saved to /var/cache/conftool/dbconfig/20220623-142440-root.json
* 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30031 and previous config saved to /var/cache/conftool/dbconfig/20220623-141910-root.json
* 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:10 taavi@deploy1002: Synchronized php-1.39.0-wmf.17/includes/skins/Skin.php: Backport: [[gerrit:807900{{!}}Skin: Change viewport based on feedback (T311119)]] (duration: 03m 29s)
* 14:10 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30030 and previous config saved to /var/cache/conftool/dbconfig/20220623-140939-root.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30029 and previous config saved to /var/cache/conftool/dbconfig/20220623-140936-root.json
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30028 and previous config saved to /var/cache/conftool/dbconfig/20220623-140406-root.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:02 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:00 volans@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update locations - volans@cumin1001"
* 14:00 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update locations - volans@cumin1001"
* 13:58 moritzm: import jenkins 2.346.1 to thirdparty/ci [[phab:T311174|T311174]]
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30027 and previous config saved to /var/cache/conftool/dbconfig/20220623-135435-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30026 and previous config saved to /var/cache/conftool/dbconfig/20220623-135432-root.json
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30025 and previous config saved to /var/cache/conftool/dbconfig/20220623-134902-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30024 and previous config saved to /var/cache/conftool/dbconfig/20220623-133931-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30023 and previous config saved to /var/cache/conftool/dbconfig/20220623-133928-root.json
* 13:38 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (2/2) (duration: 03m 26s)
* 13:34 taavi@deploy1002: Synchronized static/images/mobile/copyright/: Config: [[gerrit:807247{{!}}Add wordmark and tagline for jvwiki, jvwikt, and jvws (T311104)]] (1/2) (duration: 03m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30022 and previous config saved to /var/cache/conftool/dbconfig/20220623-133358-root.json
* 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182 db1184 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30021 and previous config saved to /var/cache/conftool/dbconfig/20220623-132951-root.json
* 13:27 sukhe: disable puppet on A:durum or A:wikidough or A:centrallog or A:dns-rec: deploying [[phab:T310574|T310574]]
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P30020 and previous config saved to /var/cache/conftool/dbconfig/20220623-132729-root.json
* 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30019 and previous config saved to /var/cache/conftool/dbconfig/20220623-132133-root.json
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30018 and previous config saved to /var/cache/conftool/dbconfig/20220623-132128-root.json
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:15 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807050{{!}}[ImageSuggestions] Enable extension on ptwiki, ruwiki & idwiki (T302711)]] (duration: 03m 44s)
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30017 and previous config saved to /var/cache/conftool/dbconfig/20220623-130629-root.json
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30016 and previous config saved to /var/cache/conftool/dbconfig/20220623-130624-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30015 and previous config saved to /var/cache/conftool/dbconfig/20220623-125553-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30014 and previous config saved to /var/cache/conftool/dbconfig/20220623-125547-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30013 and previous config saved to /var/cache/conftool/dbconfig/20220623-125125-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30012 and previous config saved to /var/cache/conftool/dbconfig/20220623-125120-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30011 and previous config saved to /var/cache/conftool/dbconfig/20220623-124049-root.json
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30010 and previous config saved to /var/cache/conftool/dbconfig/20220623-124043-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30009 and previous config saved to /var/cache/conftool/dbconfig/20220623-123621-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30008 and previous config saved to /var/cache/conftool/dbconfig/20220623-123616-root.json
* 12:26 moritzm: installing waitress security updates
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30007 and previous config saved to /var/cache/conftool/dbconfig/20220623-122545-root.json
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30006 and previous config saved to /var/cache/conftool/dbconfig/20220623-122539-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30005 and previous config saved to /var/cache/conftool/dbconfig/20220623-122118-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30004 and previous config saved to /var/cache/conftool/dbconfig/20220623-122112-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30003 and previous config saved to /var/cache/conftool/dbconfig/20220623-121041-root.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30002 and previous config saved to /var/cache/conftool/dbconfig/20220623-121035-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30001 and previous config saved to /var/cache/conftool/dbconfig/20220623-120614-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P30000 and previous config saved to /var/cache/conftool/dbconfig/20220623-120608-root.json
* 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: webauthn tests
* 11:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29999 and previous config saved to /var/cache/conftool/dbconfig/20220623-115537-root.json
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29998 and previous config saved to /var/cache/conftool/dbconfig/20220623-115532-root.json
* 11:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29997 and previous config saved to /var/cache/conftool/dbconfig/20220623-115110-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29996 and previous config saved to /var/cache/conftool/dbconfig/20220623-115104-root.json
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128 db1129 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29995 and previous config saved to /var/cache/conftool/dbconfig/20220623-114159-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29994 and previous config saved to /var/cache/conftool/dbconfig/20220623-114033-root.json
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29993 and previous config saved to /var/cache/conftool/dbconfig/20220623-114028-root.json
* 11:32 kart_: Updated cxserver to 2022-06-23-052732-production ([[phab:T311196|T311196]])
* 11:31 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 11:31 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 11:30 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 11:29 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 11:28 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 11:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29992 and previous config saved to /var/cache/conftool/dbconfig/20220623-112529-root.json
* 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29991 and previous config saved to /var/cache/conftool/dbconfig/20220623-112524-root.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 es1024 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29990 and previous config saved to /var/cache/conftool/dbconfig/20220623-110804-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29989 and previous config saved to /var/cache/conftool/dbconfig/20220623-105333-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29988 and previous config saved to /var/cache/conftool/dbconfig/20220623-105326-root.json
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29987 and previous config saved to /var/cache/conftool/dbconfig/20220623-105320-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29986 and previous config saved to /var/cache/conftool/dbconfig/20220623-103829-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29985 and previous config saved to /var/cache/conftool/dbconfig/20220623-103822-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29984 and previous config saved to /var/cache/conftool/dbconfig/20220623-103816-root.json
* 10:25 jayme: running restart-php7.2-fpm A:parsoid or A:mw or A:mw-api to disable opcache revalidation - [[phab:T266055|T266055]]
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29983 and previous config saved to /var/cache/conftool/dbconfig/20220623-102325-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29982 and previous config saved to /var/cache/conftool/dbconfig/20220623-102318-root.json
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29981 and previous config saved to /var/cache/conftool/dbconfig/20220623-102312-root.json
* 10:21 XioNoX: fix eqiad lvs switch port MTU
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29980 and previous config saved to /var/cache/conftool/dbconfig/20220623-100822-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29979 and previous config saved to /var/cache/conftool/dbconfig/20220623-100815-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29978 and previous config saved to /var/cache/conftool/dbconfig/20220623-100808-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29977 and previous config saved to /var/cache/conftool/dbconfig/20220623-095318-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29976 and previous config saved to /var/cache/conftool/dbconfig/20220623-095311-root.json
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29975 and previous config saved to /var/cache/conftool/dbconfig/20220623-095304-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29973 and previous config saved to /var/cache/conftool/dbconfig/20220623-093814-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29972 and previous config saved to /var/cache/conftool/dbconfig/20220623-093807-root.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29971 and previous config saved to /var/cache/conftool/dbconfig/20220623-093800-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29970 and previous config saved to /var/cache/conftool/dbconfig/20220623-092310-root.json
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29969 and previous config saved to /var/cache/conftool/dbconfig/20220623-092303-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29968 and previous config saved to /var/cache/conftool/dbconfig/20220623-092256-root.json
* 09:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178 db1179 db1180 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29967 and previous config saved to /var/cache/conftool/dbconfig/20220623-090842-root.json
* 09:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:52 joal@deploy1002: Finished deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs (duration: 00m 08s)
* 08:52 joal@deploy1002: Started deploy [airflow-dags/analytics@b3fe77c]: Small fixes to 2 jobs
* 08:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: Reboots
* 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 13 hosts with reason: Reboots
* 08:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 13 hosts with reason: Reboots
* 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2135].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2134].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2133].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2078,2132].codfw.wmnet with reason: Reboots
* 08:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 14 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: Reboots
* 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 9 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 7 hosts with reason: Reboots
* 07:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 7 hosts with reason: Reboots
* 07:39 moritzm: installing firejail security updates
* 07:36 TheresNoTime: UTC morning deploys done
* 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:25 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806365{{!}}GrowthExperiments: Enable link recommendations frontend, round 4 (T304548)]] (duration: 03m 37s)
* 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 22 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: Reboots
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: Reboots
* 00:35 brennen: end of phabricator maintenance window
* 00:13 brennen: phabricator deploy finished ([[phab:T311175|T311175]])
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: maintenance
* 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenance
* 00:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 00:00 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance


== 2021-02-12 ==
== 2022-06-22 ==
* 23:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1348.eqiad.wmnet
* 22:56 tzatziki: removing 1 file for legal compliance
* 23:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1356.eqiad.wmnet
* 21:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1007.eqiad.wmnet with OS bullseye
* 23:43 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 to resolve Old GC Hell alert
* 23:42 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1283.eqiad.wmnet
* 21:44 ebernhardson: restart elasticsearch_6@cloudelastic-chi-eqiad to resolve Old GC Hell alert
* 23:42 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1282.eqiad.wmnet
* 21:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1006.eqiad.wmnet with OS bullseye
* 23:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1348.eqiad.wmnet
* 20:49 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44] (duration: 01m 18s)
* 23:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1356.eqiad.wmnet
* 20:48 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry force [analytics/refinery@99cca44]
* 23:41 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1221.eqiad.wmnet
* 20:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
* 23:39 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1221.eqiad.wmnet
* 20:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 23:38 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1281.eqiad.wmnet
* 20:27 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 23:26 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 20:24 cjming: end of UTC late backport window
* 23:24 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 20:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 23:14 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 20:19 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44] (duration: 07m 36s)
* 23:02 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:51 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1284.eqiad.wmnet with reason: REIMAGE
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:49 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1283.eqiad.wmnet with reason: REIMAGE
* 20:13 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807593{{!}}gawiki: Change category collation from `uppercase` to `uca-ga-u-kn` (T311136)]] (duration: 03m 39s)
* 22:48 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1284.eqiad.wmnet with reason: REIMAGE
* 20:13 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS bullseye
* 22:47 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1283.eqiad.wmnet with reason: REIMAGE
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@99cca44]
* 22:47 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1282.eqiad.wmnet with reason: REIMAGE
* 20:11 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44] (duration: 00m 07s)
* 22:45 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1282.eqiad.wmnet with reason: REIMAGE
* 20:11 aqu@deploy1002: Started deploy [analytics/refinery@99cca44] (thin): Regular analytics weekly train THIN [analytics/refinery@99cca44]
* 22:44 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1281.eqiad.wmnet with reason: REIMAGE
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:42 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1281.eqiad.wmnet with reason: REIMAGE
* 20:10 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44] (duration: 06m 16s)
* 22:32 krinkle@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: {{Gerrit|Idc385de0}} cleanup (duration: 05m 14s)
* 20:03 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train retry [analytics/refinery@99cca44]
* 22:15 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|b3447343a}} cleanup (duration: 05m 20s)
* 20:03 aqu@deploy1002: Finished deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44] (duration: 30m 58s)
* 22:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1348.eqiad.wmnet with reason: REIMAGE
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS bullseye
* 21:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1348.eqiad.wmnet with reason: REIMAGE
* 19:42 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1006.eqiad.wmnet with OS buster
* 21:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1357.eqiad.wmnet
* 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection (duration: 02m 03s)
* 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1357.eqiad.wmnet with reason: REIMAGE
* 19:37 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@1f2f286]: namespace maps: Exclude labtest database group from data collection
* 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1356.eqiad.wmnet with reason: REIMAGE
* 19:32 aqu@deploy1002: Started deploy [analytics/refinery@99cca44]: Regular analytics weekly train [analytics/refinery@99cca44]
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1357.eqiad.wmnet with reason: REIMAGE
* 19:31 aqu: Deploying analytics/refinery (weekly train)
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1356.eqiad.wmnet with reason: REIMAGE
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1006.eqiad.wmnet with OS buster
* 20:36 mutante: mwdebug1003 now on buster - mwdebug1002 rebooting and reimaging to buster
* 19:14 herron: bounced apache on lists1001
* 20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade
* 19:06 hashar: Restarting CI Jenkins
* 20:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade
* 16:46 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1009.eqiad.wmnet with OS bullseye
* 20:32 mutante: mw1353, mw1358 - scap pull, repooled
* 16:45 hashar: Restarting CI Jenkins
* 20:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1353.eqiad.wmnet
* 16:43 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
* 20:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1358.eqiad.wmnet
* 16:33 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 20:17 mutante: mwdebug2001 - restarted memcached
* 16:29 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1009.eqiad.wmnet with reason: host reimage
* 20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1358.eqiad.wmnet
* 16:18 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1353.eqiad.wmnet
* 16:14 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 19:56 mutante: mwdebug2002 - restart memcached
* 16:13 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1358.eqiad.wmnet with reason: OS upgrade
* 16:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
* 19:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1358.eqiad.wmnet with reason: OS upgrade
* 16:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1353.eqiad.wmnet with reason: OS upgrade
* 16:08 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
* 19:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1353.eqiad.wmnet with reason: OS upgrade
* 16:06 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
* 19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug1003.eqiad.wmnet with reason: OS upgrade
* 16:05 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
* 19:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug1003.eqiad.wmnet with reason: OS upgrade
* 16:04 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
* 19:43 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back commonswiki to 1.36.0-wmf.27 due to [[phab:T274589|T274589]]
* 15:36 moritzm: upload jenkins 2.332.4 to apt.wikimedia.org [[phab:T311068|T311068]]
* 19:42 mutante: mwdebug2001 now on buster - mwdebug1003 rebooting and reimaging to stretch
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:38 milimetric@deploy1001: Finished deploy [analytics/refinery@e0c09a2] (thin): Fix for mediarequest per file cassandra job - 2 (duration: 00m 06s)
* 15:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:38 milimetric@deploy1001: Started deploy [analytics/refinery@e0c09a2] (thin): Fix for mediarequest per file cassandra job - 2
* 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:38 milimetric@deploy1001: Finished deploy [analytics/refinery@e0c09a2]: Fix for mediarequest per file cassandra job - 2 (duration: 11m 01s)
* 15:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:34 twentyafterfour: Train status: Rolling back commonswiki to wmf.27 due to [[phab:T274589|T274589]] (refs [[phab:T271344|T271344]])
* 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
* 19:29 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1358.eqiad.wmnet with reason: REIMAGE
* 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
* 19:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1353.eqiad.wmnet with reason: REIMAGE
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
* 19:28 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
* 15:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
* 19:27 milimetric@deploy1001: Started deploy [analytics/refinery@e0c09a2]: Fix for mediarequest per file cassandra job - 2
* 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1358.eqiad.wmnet with reason: REIMAGE
* 15:00 jayme: published docker-registry.discovery.wmnet/helm-state-metrics:0.1.0-1 - [[phab:T310714|T310714]]
* 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1353.eqiad.wmnet with reason: REIMAGE
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 19:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 19:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
* 14:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 19:18 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
* 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 19:18 milimetric@deploy1001: Finished deploy [analytics/refinery@366962f]: Fix for mediarequest per file cassandra job (duration: 11m 58s)
* 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
* 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 19:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
* 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 19:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
* 14:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
* 19:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
* 14:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
* 19:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
* 14:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
* 14:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:06 milimetric@deploy1001: Started deploy [analytics/refinery@366962f]: Fix for mediarequest per file cassandra job
* 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug2001.codfw.wmnet with reason: OS upgrade
* 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug2001.codfw.wmnet with reason: OS upgrade
* 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:02 mutante: rebooting and reimaging mwdebug2001 to buster [[phab:T274023|T274023]]
* 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:35 mutante: mwdebug2002 now a buster VM; you can find a .tar.gz in your home dir with the contents of your previous home
* 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:30 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@98264b8]: airflow: review and correct usage of catchup=False (duration: 03m 10s)
* 14:09 Lucas_WMDE: UTC afternoon backport+config window done
* 18:27 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@98264b8]: airflow: review and correct usage of catchup=False
* 14:09 lucaswerkmeister-wmde@deploy1002: Synchronized logos/manage.py: Config: [[gerrit:807486{{!}}logos: Update phpcs comment]] (should be a no-op but syncing just in case) (duration: 03m 19s)
* 17:33 elukey@cumin1001: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
* 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:23 bblack: cp*: re-enabling puppet after successful agent run on one host as a test!
* 14:04 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
* 17:13 bblack: cp*: disable puppet ahead of https://gerrit.wikimedia.org/r/c/operations/puppet/+/663845
* 14:01 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/%s\n' specieswiki<nowiki>{</nowiki>,-<nowiki>{</nowiki>1.5,2<nowiki>}</nowiki>x<nowiki>}</nowiki>.png {{!}} mwscript purgeList.php # [[phab:T310961|T310961]]
* 17:08 elukey@cumin1001: START - Cookbook sre.presto.reboot-workers for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
* 14:01 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (3/3) (duration: 03m 30s)
* 17:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
* 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
* 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (2/3) (duration: 03m 29s)
* 16:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
* 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:38 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
* 13:56 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
* 16:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
* 13:55 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
* 16:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
* 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/project-logos/: Config: [[gerrit:807491{{!}}specieswiki: Adjust width-height ratio of logo to fix display issue (T310961)]] (1/3) (duration: 03m 46s)
* 16:12 mforns@deploy1001: Finished deploy [analytics/refinery@9cd1297] (hadoop-test): Fix for data quality alarms after BigTop migration TEST [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5] (duration: 04m 05s)
* 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:11 hnowlan: joining maps2007 to cassandra cluster
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:08 mforns@deploy1001: Started deploy [analytics/refinery@9cd1297] (hadoop-test): Fix for data quality alarms after BigTop migration TEST [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5]
* 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:08 mforns@deploy1001: Finished deploy [analytics/refinery@9cd1297] (thin): Fix for data quality alarms after BigTop migration THIN [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5] (duration: 00m 06s)
* 13:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
* 16:07 mforns@deploy1001: Started deploy [analytics/refinery@9cd1297] (thin): Fix for data quality alarms after BigTop migration THIN [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5]
* 13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:07 mforns@deploy1001: Finished deploy [analytics/refinery@9cd1297]: Fix for data quality alarms after BigTop migration [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5] (duration: 38m 56s)
* 13:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
* 15:28 mforns@deploy1001: Started deploy [analytics/refinery@9cd1297]: Fix for data quality alarms after BigTop migration [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5]
* 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:22 herron: rolling reboot of alert[12]001 hosts for updates
* 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:16 elukey: roll restart druid broker on druid-public to pick up new settings
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1022.eqiad.wmnet
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1022.eqiad.wmnet
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (2/2) (duration: 03m 39s)
* 14:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1090.eqiad.wmnet
* 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:803496{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (3/3) (T304328)]] (1/2) (duration: 03m 35s)
* 14:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1089.eqiad.wmnet
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
* 14:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3065.esams.wmnet
* 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
* 14:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3064.esams.wmnet
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3065.esams.wmnet
* 13:28 XioNoX: fix MTU on eqiad server facing switch ports
* 13:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3064.esams.wmnet
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1090.eqiad.wmnet
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
* 13:27 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
* 13:10 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1005.eqiad.wmnet
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 12:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps2007.codfw.wmnet with reason: Resyncing database
* 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 12:11 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps2007.codfw.wmnet with reason: Resyncing database
* 13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:807255{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (2/3) (T304328)]] (duration: 03m 35s)
* 11:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2042.codfw.wmnet
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2041.codfw.wmnet
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2042.codfw.wmnet
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2041.codfw.wmnet
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 11:27 moritzm: installing emacs updates from buster point release
* 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 11:25 moritzm: installing device-tree-compiler updates from buster point release
* 13:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
* 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1088.eqiad.wmnet
* 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 11:22 moritzm: installing node-ini security updates
* 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:807254{{!}}Rename wmgWikibaseUseSSRTermbox to wmgWikibaseTermboxEnabled (1/3) (T304328)]] (duration: 03m 35s)
* 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1087.eqiad.wmnet
* 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3062.esams.wmnet
* 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 11:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3063.esams.wmnet
* 13:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
* 11:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 13:10 XioNoX: fix MTU in drmrs
* 11:14 moritzm: installing golang-1.11 security updates
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 11:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:807211{{!}}[wmf-config]: Deploy GDI Survey Wave 2 - BETA (T311079)]] (duration: 03m 29s)
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3063.esams.wmnet
* 12:58 XioNoX: fix MTU on codfw switches access ports
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3062.esams.wmnet
* 12:57 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1088.eqiad.wmnet
* 12:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1087.eqiad.wmnet
* 12:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
* 11:10 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 100%', diff saved to https://phabricator.wikimedia.org/P14337 and previous config saved to /var/cache/conftool/dbconfig/20210212-111010-jynus.json
* 12:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
* 11:06 moritzm: installing xcftools security updates
* 12:24 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
* 10:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2039.codfw.wmnet
* 12:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
* 10:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2040.codfw.wmnet
* 12:23 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
* 10:50 legoktm: repooled registry1002 after revert
* 12:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 10:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2040.codfw.wmnet
* 12:18 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 10:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2039.codfw.wmnet
* 12:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 10:39 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 75%', diff saved to https://phabricator.wikimedia.org/P14336 and previous config saved to /var/cache/conftool/dbconfig/20210212-103921-jynus.json
* 12:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 10:24 moritzm: installing wireshark security updates for stretch
* 12:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 10:22 legoktm: depooled registry1002 while fixing/debugging nginx config
* 12:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 10:22 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Victorgrigas . # [[phab:T274608|T274608]]
* 12:02 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
* 10:18 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 50%', diff saved to https://phabricator.wikimedia.org/P14335 and previous config saved to /var/cache/conftool/dbconfig/20210212-101814-jynus.json
* 11:46 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2004.codfw.wmnet
* 11:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
* 10:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4032.ulsfo.wmnet
* 11:41 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 10:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4026.ulsfo.wmnet
* 11:11 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 20s)
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1086.eqiad.wmnet
* 11:10 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3061.esams.wmnet
* 11:09 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 01m 11s)
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3060.esams.wmnet
* 11:08 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1085.eqiad.wmnet
* 11:07 volans@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps (duration: 02m 54s)
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2038.codfw.wmnet
* 11:05 volans@deploy1002: Started deploy [netbox/deploy@7bbf659]: Adding wmflib to venv deps
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5012.eqsin.wmnet
* 10:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2037.codfw.wmnet
* 10:53 jayme: systemctl restart rsyslog on kubernetes2008
* 10:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5006.eqsin.wmnet
* 10:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
* 10:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2004.codfw.wmnet
* 10:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
* 09:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2003.codfw.wmnet
* 10:41 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
* 09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5006.eqsin.wmnet
* 10:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
* 09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5012.eqsin.wmnet
* 10:36 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
* 09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4026.ulsfo.wmnet
* 10:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
* 09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4032.ulsfo.wmnet
* 10:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3061.esams.wmnet
* 10:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
* 10:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2038.codfw.wmnet
* 10:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2037.codfw.wmnet
* 10:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1086.eqiad.wmnet
* 10:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
* 09:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1085.eqiad.wmnet
* 10:17 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
* 09:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2003.codfw.wmnet
* 10:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
* 09:45 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 30%', diff saved to https://phabricator.wikimedia.org/P14334 and previous config saved to /var/cache/conftool/dbconfig/20210212-094520-jynus.json
* 10:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
* 09:32 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 20%', diff saved to https://phabricator.wikimedia.org/P14333 and previous config saved to /var/cache/conftool/dbconfig/20210212-093211-jynus.json
* 10:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
* 09:31 moritzm: installing node-y18n security updates
* 10:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
* 08:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2002.wikimedia.org with reason: REIMAGE
* 10:06 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti-test2003.codfw.wmnet
* 08:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2002.wikimedia.org with reason: REIMAGE
* 10:04 moritzm: installing vim security updates
* 08:25 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 10%', diff saved to https://phabricator.wikimedia.org/P14331 and previous config saved to /var/cache/conftool/dbconfig/20210212-082526-jynus.json
* 09:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 08:15 moritzm: reimaging bast2002 to buster
* 09:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
* 07:54 elukey: roll restart of druid brokers on druid-public - locked after scheduled datasource deletion
* 09:35 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 03:36 krinkle@deploy1001: Finished deploy [integration/docroot@3c943ba]: {{Gerrit|I89e1ec881}} (duration: 00m 08s)
* 09:35 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Adding support for Ganeti groups
* 03:36 krinkle@deploy1001: Started deploy [integration/docroot@3c943ba]: {{Gerrit|I89e1ec881}}
* 09:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
* 01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1329.eqiad.wmnet
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1330.eqiad.wmnet
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1331.eqiad.wmnet
* 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1332.eqiad.wmnet
* 09:17 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 01:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1332.eqiad.wmnet
* 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 01:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1331.eqiad.wmnet
* 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
* 01:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1330.eqiad.wmnet
* 09:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
* 01:07 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1329.eqiad.wmnet
* 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 01:06 Urbanecm: Evening B&C done
* 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
* 01:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|389f7f1fdc9ad4a0c163ccfe1d80f2aaec7f8038}}: Enable DiscussionTools Reply Tool A/B test ([[phab:T273554|T273554]]) (duration: 01m 08s)
* 08:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
* 01:02 urbanecm@deploy1001: sync-file aborted: {{Gerrit|389f7f1fdc9ad4a0c163ccfe1d80f2aaec7f8038}}: Enable DiscussionTools Reply Tool A/B test (duration: 00m 48s)
* 08:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
* 01:01 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/VisualEditor/: {{Gerrit|c86cd00076c9f1857f4bafb04a15640ad66da863}}: {{Gerrit|de4a562d3baec77c85bfa05ba59778b882a6f9d2}}: VE backports ([[phab:T273096|T273096]]) (duration: 01m 15s)
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29964 and previous config saved to /var/cache/conftool/dbconfig/20220622-084234-root.json
* 00:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5d92ed15c51d57f43bad054d0469f54848b84d6a}}: Add import sources for zh_yuewiki ([[phab:T274597|T274597]]) (duration: 01m 13s)
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29963 and previous config saved to /var/cache/conftool/dbconfig/20220622-084225-root.json
* 00:34 foks: removing 2 files for legal compliance
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29962 and previous config saved to /var/cache/conftool/dbconfig/20220622-084206-root.json
* 00:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a022f2b506089ab518b74c1dfca78924c06dc80f}}: Oversample DiscussionTools EditAttemptStep logging ([[phab:T273946|T273946]]) (duration: 01m 08s)
* 08:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
* 00:30 Urbanecm: mwscript namespaceDupes.php itwikiquote --fix --add-prefix=BROKEN # [[phab:T273362|T273362]]
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29961 and previous config saved to /var/cache/conftool/dbconfig/20220622-082730-root.json
* 00:29 Urbanecm: mwscript namespaceDupes.php itwikiquote --fix # [[phab:T273362|T273362]]
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29960 and previous config saved to /var/cache/conftool/dbconfig/20220622-082721-root.json
* 00:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f051c6cdaa162ce2ea42aa53a24e50bb4aa8a793}}: Adding WQ as namespace alias for itwikiquote ([[phab:T273362|T273362]]) (duration: 01m 10s)
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29959 and previous config saved to /var/cache/conftool/dbconfig/20220622-082702-root.json
* 00:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|53229b0f41eb8cc3e8a90157283913c7d69810df}}: Enabling extension SandboxLink on ltwiki ([[phab:T273957|T273957]]) (duration: 01m 07s)
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
* 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
* 08:26 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
* 00:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
* 08:18 marostegui: Upgrade kernel and reboot on db[1111,1132,1143,1127].eqiad.wmnet
* 00:07 ejegg: updated fundraising civicrm from {{Gerrit|b81cb5e702}} to {{Gerrit|dfbb8f41bc}}
* 08:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
* 08:15 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]] (duration: 03m 43s)
* 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29957 and previous config saved to /var/cache/conftool/dbconfig/20220622-081227-root.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29956 and previous config saved to /var/cache/conftool/dbconfig/20220622-081217-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29955 and previous config saved to /var/cache/conftool/dbconfig/20220622-081159-root.json
* 08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.17  refs [[phab:T308070|T308070]]
* 08:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
* 08:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
* 08:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
* 08:04 hashar: Updating operations-puppet-tests-buster-docker Jenkins job to use the latest Docker image (rebuild to catch up with latest defined gems). https://gerrit.wikimedia.org/r/c/integration/config/+/807478
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29954 and previous config saved to /var/cache/conftool/dbconfig/20220622-075721-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29953 and previous config saved to /var/cache/conftool/dbconfig/20220622-075713-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29952 and previous config saved to /var/cache/conftool/dbconfig/20220622-075655-root.json
* 07:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
* 07:53 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
* 07:50 marostegui: Upgrade kernel and reboot on db[2145-2150].codfw.wmnet
* 07:49 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29951 and previous config saved to /var/cache/conftool/dbconfig/20220622-074217-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29950 and previous config saved to /var/cache/conftool/dbconfig/20220622-074209-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29949 and previous config saved to /var/cache/conftool/dbconfig/20220622-074151-root.json
* 07:40 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 07:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
* 07:31 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29948 and previous config saved to /var/cache/conftool/dbconfig/20220622-072714-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29947 and previous config saved to /var/cache/conftool/dbconfig/20220622-072705-root.json
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29946 and previous config saved to /var/cache/conftool/dbconfig/20220622-072647-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29945 and previous config saved to /var/cache/conftool/dbconfig/20220622-071210-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29944 and previous config saved to /var/cache/conftool/dbconfig/20220622-071201-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 2%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29943 and previous config saved to /var/cache/conftool/dbconfig/20220622-071143-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1027 es1026 es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P29942 and previous config saved to /var/cache/conftool/dbconfig/20220622-065507-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Switchover es1, es2 and es3 masters', diff saved to https://phabricator.wikimedia.org/P29941 and previous config saved to /var/cache/conftool/dbconfig/20220622-065208-marostegui.json
* 05:52 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 01:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:17 tstarling@deploy1002: Synchronized wmf-config/mc-labs.php: for completeness (duration: 03m 41s)
* 01:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 01:13 tstarling@deploy1002: Synchronized wmf-config/mc.php: g 807158 [[phab:T278392|T278392]] (duration: 03m 35s)
* 01:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 01:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 01:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply


== 2021-02-11 ==
== 2022-06-21 ==
* 23:50 Urbanecm: Deploy security patch for [[phab:T274514|T274514]]
* 20:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b42e57d75ec6b0536493fa073805a0bcb066aef1}}: zhwikiquote: Disable local upload ([[phab:T311017|T311017]]) (duration: 03m 43s)
* 23:47 mutante: reimaged mwdebug2002 with buster - since this is a VM:  manually cleaned puppet cert on puppetmaster1001, signed new cert for same hostname, initial puppet run etc ([[phab:T274023|T274023]])
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:44 twentyafterfour: Train status for wmf.30 ([[phab:T271344|T271344]]) is blocked until monday. leaving wmf.30 on group1 and wmf.27 on group2 in spite of [[phab:T260401|T260401]]
* 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1002.eqiad.wmnet with reason: REIMAGE
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1002.eqiad.wmnet with reason: REIMAGE
* 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
* 20:22 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (2/2) (duration: 03m 28s)
* 23:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 23:20 mutante: reimaging mwdebug2002 - stretch -> buster
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:57 Urbanecm: Run scap pull at mwmaint1002
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:53 mutante: powercycling crashed mwmaint1002
* 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:53 Urbanecm: Deploy security  patch for [[phab:T274514|T274514]]
* 20:18 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|721e413fff4e797626c7c5e8433130f341310af0}}: zh_classicalwiki: Declare commons files for logo (1/2) (duration: 03m 30s)
* 22:11 legoktm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/GlobalWatchlist: GlobalWatchlist backports (duration: 01m 11s)
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 22:05 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1332.eqiad.wmnet with reason: REIMAGE
* 20:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f70e302e11756d9704acc86c45b3d7aabf31c4d}}: fawiktionary: Enable SandboxLink extension ([[phab:T308505|T308505]]) (duration: 03m 37s)
* 22:03 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1331.eqiad.wmnet with reason: REIMAGE
* 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 22:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1332.eqiad.wmnet with reason: REIMAGE
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 22:01 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1330.eqiad.wmnet with reason: REIMAGE
* 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 22:01 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1331.eqiad.wmnet with reason: REIMAGE
* 19:38 dancy@deploy1002: backport aborted: (duration: 00m 10s)
* 21:59 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1329.eqiad.wmnet with reason: REIMAGE
* 19:38 dancy@deploy1002: Installation of scap version "4.9.5" completed for 558 hosts
* 21:59 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1330.eqiad.wmnet with reason: REIMAGE
* 19:38 dancy@deploy1002: Installing scap version "4.9.5" for 558 hosts
* 21:57 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1329.eqiad.wmnet with reason: REIMAGE
* 19:22 urandom: replicating Cassandra `system_auth` keyspace to codfw -- [[phab:T307641|T307641]]
* 21:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1354.eqiad.wmnet
* 18:56 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2` failed due to syntax error, patch here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/807200
* 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1354.eqiad.wmnet
* 18:48 ryankemper: [[phab:T301461|T301461]] `ryankemper@miscweb1002:~$ sudo systemctl reload apache2`
* 21:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1001.wikimedia.org
* 21:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1355.eqiad.wmnet
* 17:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1359.eqiad.wmnet
* 17:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 21:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1359.eqiad.wmnet
* 17:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp1001.wikimedia.org
* 21:37 mutante: mw1355, mw1359 - power cycling
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2001.wikimedia.org
* 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1354.eqiad.wmnet with reason: REIMAGE
* 17:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1354.eqiad.wmnet with reason: REIMAGE
* 17:19 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host elastic1049.eqiad.wmnet
* 21:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1360.eqiad.wmnet
* 17:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 21:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1360.eqiad.wmnet
* 17:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts idp2001.wikimedia.org
* 21:05 mutante: mw1360 - powercycling
* 17:14 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 21:01 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1364.eqiad.wmnet
* 17:09 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1049.eqiad.wmnet
* 20:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1364.eqiad.wmnet
* 17:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 20:52 mutante: mw1364 - powercycled
* 17:01 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 20:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1355.eqiad.wmnet with reason: REIMAGE
* 16:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
* 20:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1355.eqiad.wmnet with reason: REIMAGE
* 16:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1359.eqiad.wmnet with reason: REIMAGE
* 16:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1359.eqiad.wmnet with reason: REIMAGE
* 16:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
* 20:26 twentyafterfour: new train blocker preventing deploy of 1.36.0-wmf.30 to all wikis. [[phab:T274589|T274589]] blocks [[phab:T271344|T271344]]
* 15:59 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
* 20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1365.eqiad.wmnet
* 15:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 20:23 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1365.eqiad.wmnet
* 15:55 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2048.codfw.wmnet
* 20:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1360.eqiad.wmnet with reason: REIMAGE
* 15:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
* 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1360.eqiad.wmnet with reason: REIMAGE
* 15:52 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
* 20:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1361.eqiad.wmnet
* 15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 20:09 mutante: mw1365 - powercycle - reboot issue
* 15:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 20:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1361.eqiad.wmnet
* 15:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (2/2) (duration: 03m 28s)
* 20:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1364.eqiad.wmnet with reason: REIMAGE
* 15:37 klausman: restarting pybal on lvs2009
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1364.eqiad.wmnet with reason: REIMAGE
* 15:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
* 19:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1362.eqiad.wmnet
* 15:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 19:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1362.eqiad.wmnet
* 15:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:806877{{!}}Enable Lexeme Lua access everywhere (T309593)]] (1/2) (duration: 03m 51s)
* 19:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1368.eqiad.wmnet
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1361.eqiad.wmnet with reason: REIMAGE
* 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:40 mutante: mw1368 - had the reboot via IPMI issue, did DRAC reset and repeated wmf-autoreimage, issue did not happen again
* 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 19:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1368.eqiad.wmnet
* 15:30 klausman: Restarting pybal on lvs2010
* 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1361.eqiad.wmnet with reason: REIMAGE
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:32 urbanecm@deploy1001: Synchronized wmf-config/logos.php: noop: {{Gerrit|a1244df3e829abc793113a7e32d1972db9f780a8}}: Add inline documentation to configuration about updating logos regarding labs (duration: 01m 08s)
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 19:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1365.eqiad.wmnet with reason: REIMAGE
* 15:27 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|93e168cb7788c772895b47f239275544fb745358}}: Added Kokebok namespace to nowikibooks ([[phab:T274265|T274265]]) (duration: 01m 20s)
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2002.codfw.wmnet
* 19:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1362.eqiad.wmnet with reason: REIMAGE
* 15:26 klausman@puppetmaster1001: conftool action : set/weight=1; selector: name=ml-staging2001.codfw.wmnet
* 19:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1365.eqiad.wmnet with reason: REIMAGE
* 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1362.eqiad.wmnet with reason: REIMAGE
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging-ctrl2002.codfw.wmnet
* 19:20 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2002.codfw.wmnet
* 19:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1363.eqiad.wmnet
* 15:17 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-staging2001.codfw.wmnet
* 19:13 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:16 klausman@cumin1001: conftool action : help; selector: name=ml-staging2001
* 19:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1363.eqiad.wmnet
* 15:15 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:13 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 15:06 moritzm: installing avahi security updates
* 19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 15:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 19:04 mutante: mw1363 - powercycled, reboot issue
* 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1374.eqiad.wmnet
* 15:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1374.eqiad.wmnet
* 15:01 papaul: PDU swap for rack a2 complete
* 18:46 mutante: mw1368 - racadm racreset
* 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:46 mutante: mw1368 - reboot via IPMI issue & can't powercycle "Unable to perform requested operation." - racreet
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:43 mutante: mw1374 - powercycled, reboot via ipmi issue
* 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:24 papaul: on going maintenance on ps1-a2-codfw
* 18:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:59 bblack: lvs2007 - downtimes ended, back in service - [[phab:T274571|T274571]]
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE
* 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
* 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE
* 13:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 17:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE
* 13:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
* 17:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE
* 13:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
* 17:52 bblack: lvs2007 - starting up puppet + pybal - [[phab:T274571|T274571]]
* 13:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
* 17:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1375.eqiad.wmnet
* 13:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
* 17:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1375.eqiad.wmnet
* 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet
* 13:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
* 17:31 bblack: lvs2007 - shutting down host - [[phab:T274571|T274571]]
* 13:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 17:27 bblack: lvs2007 - stopping pybal - [[phab:T274571|T274571]]
* 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:26 bblack: lvs2007 - puppet disabled, downtimed in icinga - [[phab:T274571|T274571]]
* 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:28 daniel@deploy1002: Synchronized rpc/: Config: [[gerrit:805775{{!}}rpc: Remove unused RunJobs.php (T175146 T243096)]] (duration: 03m 45s)
* 17:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet
* 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
* 17:07 mutante: mw1375 - powercycle - stuck at reboot
* 13:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 17:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet
* 13:05 moritzm: installing Linux 5.10.120-1~bpo10+1 on buster hosts with backports kernel
* 16:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged
* 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 16:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged
* 13:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 16:38 mutante: mw1368 - File "/usr/lib/python3/dist-packages/spicerack/remote.py", line 637, in _execute  raise RemoteExecutionError(ret, 'Cumin execution failed')
* 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
* 16:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 12:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
* 16:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1375.eqiad.wmnet with reason: REIMAGE
* 12:56 moritzm: installing haproxy security updates on stretch
* 16:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1376.eqiad.wmnet with reason: REIMAGE
* 12:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 16:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1375.eqiad.wmnet with reason: REIMAGE
* 12:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 16:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 12:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
* 16:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1376.eqiad.wmnet with reason: REIMAGE
* 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
* 16:24 ejegg: updated payments-wiki from {{Gerrit|a232fc3438}} to {{Gerrit|4b7b195c8a}}
* 12:43 moritzm: installing python-bottle security updates
* 16:13 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1163 at 1%, again [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14323 and previous config saved to /var/cache/conftool/dbconfig/20210211-161308-kormat.json
* 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
* 15:52 jynus: deploying fixed grants to db1163
* 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 15:50 gehel: ban elastic2054 from shard allocation - [[phab:T274555|T274555]]
* 12:25 moritzm: reset logster-csp/logster-badpass-priv on mwlog1002, these were removed from Puppet
* 15:49 jynus@cumin1001: dbctl commit (dc=all): 'Depool 1163', diff saved to https://phabricator.wikimedia.org/P14321 and previous config saved to /var/cache/conftool/dbconfig/20210211-154902-jynus.json
* 12:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 15:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
* 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 15:46 gehel: depooling elastic2054 - [[phab:T274555|T274555]]
* 12:06 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 15:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
* 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 15:45 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1163 at 1% [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14320 and previous config saved to /var/cache/conftool/dbconfig/20210211-154501-kormat.json
* 11:59 mbsantos: mbsantos@maps2009 imposm-removebackup-import ([[phab:T305845|T305845]])
* 15:39 gehel: powercycle elastic2054 - [[phab:T274555|T274555]]
* 11:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 15:39 gehel: powercycle elastic2054
* 11:44 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 14:44 kormat@cumin1001: dbctl commit (dc=all): 'Add db1163 to s1 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14318 and previous config saved to /var/cache/conftool/dbconfig/20210211-144445-kormat.json
* 11:43 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 14:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreams: Update sampling config syntax for test.instrumentation.sampled (duration: 01m 08s)
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for testing', diff saved to https://phabricator.wikimedia.org/P29936 and previous config saved to /var/cache/conftool/dbconfig/20220621-114232-root.json
* 14:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2001.wikimedia.org
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for testing', diff saved to https://phabricator.wikimedia.org/P29935 and previous config saved to /var/cache/conftool/dbconfig/20220621-114216-root.json
* 14:02 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon2001.wikimedia.org
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for testing', diff saved to https://phabricator.wikimedia.org/P29934 and previous config saved to /var/cache/conftool/dbconfig/20220621-114151-root.json
* 13:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
* 10:57 volans: deleting netbox getstats.GetDeviceStats job results - [[phab:T311048|T311048]]
* 13:48 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
* 10:51 kart_: Updated cxserver to 2022-06-21-035954-production ([[phab:T307970|T307970]])
* 13:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
* 10:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 13:41 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
* 10:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 13:28 godog: test grafana 7.4.1 upgrade on grafana2001 - [[phab:T263747|T263747]]
* 10:47 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 13:27 moritzm: re-adding ganeti5002 to the eqsin Ganeti cluster following mainboard replacement/reinstall [[phab:T261130|T261130]]
* 10:47 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 13:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
* 10:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 13:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 10:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 13:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
* 10:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 13:04 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 09:31 urbanecm: 09:29:23 Synchronized wmf-config/throttle.php: {{Gerrit|7c9f6a561b2b4b5c5db063bad83bd23e9cbac347}}: Add a throttle rule for a Czech course ([[phab:T310885|T310885]]) (duration: 03m 34s) #manually logging in logmsgbot's absence
* 13:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
* 09:20 marostegui: dbmaint s8@eqiad [[phab:T310011|T310011]]
* 13:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 09:13 marostegui: dbmaint s8@codfw [[phab:T310011|T310011]]
* 12:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 08:29 marostegui: Reboot db1120 for kernel upgrade
* 12:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
* 08:14 moritzm: remove EOLed parsoid debs from releases.wikimedia.org [[phab:T309765|T309765]]
* 12:45 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
* 05:54 marostegui: Reboot db1132 and db1181 for kernel upgrade
* 12:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
* 12:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 12:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2b1df105afd9f9c9c047ae9c0a434674f43d505}}: Changing frwiktionary wmgBabelMainCategory ([[phab:T274137|T274137]]) (duration: 01m 08s)
* 12:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
* 12:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 12:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:662967{{!}}wikidata: post edit constraint jobs on 50% of edits (T204031)]] (up from 40%) (duration: 01m 08s)
* 12:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:662970{{!}}wikidata: add Dagbani to wmgExtraLanguageNames (T272242)]] (duration: 01m 29s)
* 12:06 jynus: restart-failed systemd on cumin1001 after s5 eqiad snapshot failed
* 11:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2002.codfw.wmnet
* 11:45 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:41 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 11:40 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:39 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 11:39 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2002.codfw.wmnet
* 11:35 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 11:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2001.codfw.wmnet
* 11:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2001.codfw.wmnet
* 11:25 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1004.eqiad.wmnet
* 11:17 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 11:13 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 11:06 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 11:04 kormat@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: changed binlog_format [[phab:T274472|T274472]]', diff saved to https://phabricator.wikimedia.org/P14315 and previous config saved to /var/cache/conftool/dbconfig/20210211-110447-kormat.json
* 11:03 moritzm: installing firejail security updates on Stretch
* 10:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
* 10:50 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
* 10:49 kormat@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 66%: changed binlog_format [[phab:T274472|T274472]]', diff saved to https://phabricator.wikimedia.org/P14314 and previous config saved to /var/cache/conftool/dbconfig/20210211-104943-kormat.json
* 10:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
* 10:40 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
* 10:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
* 10:34 kormat@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 33%: changed binlog_format [[phab:T274472|T274472]]', diff saved to https://phabricator.wikimedia.org/P14313 and previous config saved to /var/cache/conftool/dbconfig/20210211-103440-kormat.json
* 10:33 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
* 10:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 10:20 kormat@cumin1001: dbctl commit (dc=all): 'db1118 depooling: change binlog_format', diff saved to https://phabricator.wikimedia.org/P14312 and previous config saved to /var/cache/conftool/dbconfig/20210211-101959-kormat.json
* 10:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1118.eqiad.wmnet with reason: Depooling to change binglog_format [[phab:T274472|T274472]]
* 10:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1118.eqiad.wmnet with reason: Depooling to change binglog_format [[phab:T274472|T274472]]
* 10:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 10:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
* 10:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4025.ulsfo.wmnet
* 10:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4031.ulsfo.wmnet
* 10:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2035.codfw.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1084.eqiad.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3059.esams.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1083.eqiad.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3058.esams.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2036.codfw.wmnet
* 10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5005.eqsin.wmnet
* 10:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5011.eqsin.wmnet
* 10:07 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
* 10:02 jynus: switching db1118 to row_format=STATEMENT as new s1 master candidate
* 10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
* 10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5011.eqsin.wmnet
* 10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4025.ulsfo.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4031.ulsfo.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3059.esams.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3058.esams.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2036.codfw.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2035.codfw.wmnet
* 09:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1084.eqiad.wmnet
* 09:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1083.eqiad.wmnet
* 09:53 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thumbor1004.eqiad.wmnet
* 09:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 09:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
* 09:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 09:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
* 09:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2001.codfw.wmnet
* 09:12 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
* 09:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host rpki2001.codfw.wmnet
* 09:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
* 09:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
* 08:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1004.eqiad.wmnet
* 08:59 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1003.eqiad.wmnet
* 08:52 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
* 08:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
* 08:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1003.eqiad.wmnet
* 08:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
* 08:35 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1003.eqiad.wmnet
* 08:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3005.wikimedia.org
* 08:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast3005.wikimedia.org
* 08:11 legoktm@deploy1001: Synchronized php-1.36.0-wmf.30/vendor/wikimedia/shellbox/src/Command/BashWrapper.php: wikimedia/shellbox: Don't unconditionally allowPath( 'limit.sh' ) - [[phab:T274474|T274474]] (duration: 01m 32s)
* 08:09 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1002.eqiad.wmnet
* 08:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4003.wikimedia.org
* 08:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast4003.wikimedia.org
* 07:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
* 07:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
* 07:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1021.eqiad.wmnet
* 07:44 XioNoX: push improved loopback dhcp term to all routers
* 07:39 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1021.eqiad.wmnet
* 07:25 effie: pool thumbor1001
* 07:06 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
* 07:06 elukey: powercycle thumbor1001 - no ssh, no mgmt serial tty available, no racadm getsel infos
* 06:45 kart_: Updated cxserver to 2021-02-10-134029-production ([[phab:T274133|T274133]], [[phab:T273456|T273456]], [[phab:T271980|T271980]])
* 06:41 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:35 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:33 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 03:10 rzl@cumin1001: dbctl commit (dc=all): 'depool db1134', diff saved to https://phabricator.wikimedia.org/P14310 and previous config saved to /var/cache/conftool/dbconfig/20210211-031048-rzl.json
* 03:10 rzl: depooled db1134
* 02:18 milimetric@deploy1001: Finished deploy [analytics/refinery@01d811f] (thin): Fix spelling error in mediacounts job (duration: 00m 06s)
* 02:18 milimetric@deploy1001: Started deploy [analytics/refinery@01d811f] (thin): Fix spelling error in mediacounts job
* 02:18 milimetric@deploy1001: Finished deploy [analytics/refinery@01d811f]: Fix spelling error in mediacounts job (duration: 11m 06s)
* 02:07 milimetric@deploy1001: Started deploy [analytics/refinery@01d811f]: Fix spelling error in mediacounts job
* 02:05 dwisehaupt: move payments1* and frpig1* out of maintenance mode
* 02:04 eileen: process-control config revision is {{Gerrit|726db3446a}}
* 02:02 dwisehaupt: move civi1001 out of maintenance mode
* 01:54 eileen: civicrm revision changed from {{Gerrit|3776363c90}} to {{Gerrit|b81cb5e702}}, config revision is {{Gerrit|f216d8fe8e}}
* 01:35 dwisehaupt: applying new civicrm triggers to frdb1002
* 01:14 eileen: civicrm revision changed from {{Gerrit|2ce8194c07}} to {{Gerrit|3776363c90}}, config revision is {{Gerrit|f216d8fe8e}}
* 01:06 dwisehaupt: stopping mariadb replication on frdev1001 and frdb1004
* 01:05 dwisehaupt: Move payments/civi/frpig into maint mode for civi upgrade
* 01:04 eileen: process-control config revision is {{Gerrit|f216d8fe8e}}
* 00:26 legoktm@deploy1001: Synchronized wmf-config/profiler.php: Revert "profiler: Send data to excimer-buster pipeline" (duration: 02m 00s)
* 00:03 milimetric@deploy1001: Finished deploy [analytics/refinery@3da19b6] (thin): More fixes for jobs after cluster upgrade (duration: 00m 07s)
* 00:03 milimetric@deploy1001: Started deploy [analytics/refinery@3da19b6] (thin): More fixes for jobs after cluster upgrade


== 2021-02-10 ==
== 2022-06-20 ==
* 23:53 milimetric@deploy1001: Finished deploy [analytics/refinery@3da19b6]: More fixes for jobs after cluster upgrade (duration: 14m 23s)
* 07:14 SandraEbele: Started Airflow 3 Wikidata metrics jobs (Articleplaceholder, Reliability and SpecialEntityData metrics).
* 23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1328.eqiad.wmnet
* 07:14 SandraEbele: killed Oozie wikidata-articleplaceholder_metrics-coord, wikidata-reliability_metrics-coord, and wikidata-specialentitydata_metrics-coord jobs.
* 23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet
* 23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1326.eqiad.wmnet
* 23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1325.eqiad.wmnet
* 23:38 milimetric@deploy1001: Started deploy [analytics/refinery@3da19b6]: More fixes for jobs after cluster upgrade
* 23:36 eileen: civicrm revision changed from {{Gerrit|ae24f87158}} to {{Gerrit|2ce8194c07}}, config revision is {{Gerrit|a48a7db0a2}}
* 22:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1328.eqiad.wmnet
* 22:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet
* 22:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1326.eqiad.wmnet
* 22:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1325.eqiad.wmnet
* 22:32 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d97f7d9]: query_clicks: Remove result file merging (duration: 01m 27s)
* 22:30 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d97f7d9]: query_clicks: Remove result file merging
* 22:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1377.eqiad.wmnet
* 22:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1369.eqiad.wmnet
* 22:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1377.eqiad.wmnet
* 22:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1369.eqiad.wmnet
* 22:07 mutante: mw1369, mw1377 - all servers in this section now consistenly fail to reboot when triggered as the last step of wmf-reimage script
* 21:43 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1328.eqiad.wmnet with reason: REIMAGE
* 21:41 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1327.eqiad.wmnet with reason: REIMAGE
* 21:41 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1328.eqiad.wmnet with reason: REIMAGE
* 21:39 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1326.eqiad.wmnet with reason: REIMAGE
* 21:39 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1327.eqiad.wmnet with reason: REIMAGE
* 21:37 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1325.eqiad.wmnet with reason: REIMAGE
* 21:37 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1326.eqiad.wmnet with reason: REIMAGE
* 21:35 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1325.eqiad.wmnet with reason: REIMAGE
* 21:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1377.eqiad.wmnet with reason: REIMAGE
* 21:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1369.eqiad.wmnet with reason: REIMAGE
* 21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1377.eqiad.wmnet with reason: REIMAGE
* 21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1369.eqiad.wmnet with reason: REIMAGE
* 20:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1293.eqiad.wmnet
* 20:37 eileen: civicrm revision changed from {{Gerrit|f161a34266}} to {{Gerrit|ae24f87158}}, config revision is {{Gerrit|a48a7db0a2}}
* 20:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1293.eqiad.wmnet
* 20:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1370.eqiad.wmnet
* 20:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1378.eqiad.wmnet
* 20:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1378.eqiad.wmnet
* 20:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1370.eqiad.wmnet
* 20:23 mutante: mw1370, mw1378 - powercycling via DRAC
* 20:21 mutante: mw1370, mw1378 - again failing to reboot as the last step of reimaging script
* 20:19 jgleeson: updated civicrm from {{Gerrit|1e9a86dd6e}} to {{Gerrit|f161a34266}}
* 20:13 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.30 (duration: 01m 02s)
* 20:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.30
* 20:05 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1324.eqiad.wmnet
* 20:01 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1c5477d]: query_clicks: timestamp is now a reserved keyword (duration: 02m 19s)
* 20:01 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1323.eqiad.wmnet
* 20:00 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1322.eqiad.wmnet
* 20:00 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1321.eqiad.wmnet
* 19:59 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c5477d]: query_clicks: timestamp is now a reserved keyword
* 19:54 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1324.eqiad.wmnet
* 19:54 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1323.eqiad.wmnet
* 19:54 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1322.eqiad.wmnet
* 19:54 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1321.eqiad.wmnet
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1293.eqiad.wmnet with reason: REIMAGE
* 19:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1293.eqiad.wmnet with reason: REIMAGE
* 19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1378.eqiad.wmnet with reason: REIMAGE
* 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1378.eqiad.wmnet with reason: REIMAGE
* 19:20 thcipriani@deploy1001: Synchronized wmf-config/ProductionServices.php: [[gerrit:661732{{!}}Remove a couple of useless DNS lookups from mediawiki-config]] [[phab:T231025|T231025]] (duration: 01m 10s)
* 19:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1294.eqiad.wmnet
* 19:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1294.eqiad.wmnet
* 19:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1379.eqiad.wmnet
* 19:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1370.eqiad.wmnet with reason: REIMAGE
* 19:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
* 19:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1370.eqiad.wmnet with reason: REIMAGE
* 19:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1294.eqiad.wmnet
* 19:04 mutante: mw1379 - racadm racreset - host did not come back from reboot and DRAC says it can't powercycle it.. while it also ALREADY ON
* 19:00 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1324.eqiad.wmnet with reason: REIMAGE
* 19:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1379.eqiad.wmnet
* 18:58 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1323.eqiad.wmnet with reason: REIMAGE
* 18:58 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1324.eqiad.wmnet with reason: REIMAGE
* 18:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1371.eqiad.wmnet
* 18:56 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1322.eqiad.wmnet with reason: REIMAGE
* 18:56 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1323.eqiad.wmnet with reason: REIMAGE
* 18:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1321.eqiad.wmnet with reason: REIMAGE
* 18:54 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1322.eqiad.wmnet with reason: REIMAGE
* 18:52 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1321.eqiad.wmnet with reason: REIMAGE
* 18:36 andrew@deploy1001: Finished deploy [horizon/deploy@02cb8a4]: security group dashboard policy updates, now after doing a submodule update! (duration: 03m 31s)
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
* 18:33 andrew@deploy1001: Started deploy [horizon/deploy@02cb8a4]: security group dashboard policy updates, now after doing a submodule update!
* 18:32 andrew@deploy1001: Finished deploy [horizon/deploy@4f5a5a7]: security group dashboard policy updates (duration: 00m 07s)
* 18:32 andrew@deploy1001: Started deploy [horizon/deploy@4f5a5a7]: security group dashboard policy updates
* 18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1371.eqiad.wmnet
* 18:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1294.eqiad.wmnet with reason: REIMAGE
* 18:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1294.eqiad.wmnet with reason: REIMAGE
* 18:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thumbor1001.eqiad.wmnet
* 17:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1295.eqiad.wmnet
* 17:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1295.eqiad.wmnet
* 17:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
* 17:18 shdubsh: restart pybal on low-traffic lvs1015
* 17:13 shdubsh: restart pybal on backup lvs1016
* 17:13 andrew@deploy1001: Finished deploy [horizon/deploy@4f5a5a7]: puppet dashboard policy updates (duration: 03m 53s)
* 17:09 andrew@deploy1001: Started deploy [horizon/deploy@4f5a5a7]: puppet dashboard policy updates
* 16:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1295.eqiad.wmnet with reason: REIMAGE
* 16:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1295.eqiad.wmnet with reason: REIMAGE
* 16:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1379.eqiad.wmnet with reason: REIMAGE
* 16:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1379.eqiad.wmnet with reason: REIMAGE
* 16:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1371.eqiad.wmnet with reason: REIMAGE
* 16:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1371.eqiad.wmnet with reason: REIMAGE
* 16:20 moritzm: installing unzip security updates
* 16:12 moritzm: installing atftp security updates
* 16:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 16:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 15:26 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Do not produce canary events for rdf-streaming-updater streams - [[phab:T269619|T269619]] (duration: 01m 13s)
* 15:11 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.30
* 15:05 hashar: group0 wikis to 1.36.0-wmf.30  [[phab:T271344|T271344]]
* 14:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2033.codfw.wmnet
* 14:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4024.ulsfo.wmnet
* 14:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3057.esams.wmnet
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1081.eqiad.wmnet
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5010.eqsin.wmnet
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4030.ulsfo.wmnet
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3056.esams.wmnet
* 14:51 jynus: updating puppet-compiler-facts
* 14:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5004.eqsin.wmnet
* 14:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2034.codfw.wmnet
* 14:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2033.codfw.wmnet
* 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5004.eqsin.wmnet
* 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5010.eqsin.wmnet
* 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4024.ulsfo.wmnet
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4030.ulsfo.wmnet
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3057.esams.wmnet
* 14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3056.esams.wmnet
* 14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2034.codfw.wmnet
* 14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet
* 14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1081.eqiad.wmnet
* 12:26 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T269619|T269619]]: [wdqs] Add flink sideoutput stream definitions (duration: 01m 06s)
* 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:658321{{!}}Remove Wikibase.NewItemIdFormatter log channel (T268870)]] 2/2 (prod no-op) (duration: 01m 08s)
* 12:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:658321{{!}}Remove Wikibase.NewItemIdFormatter log channel (T268870)]] 1/2 (duration: 01m 07s)
* 12:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e8214ee812f3812f609c26d6422b85a99a91e1f6}}: Enable GrowthExperiments on bnwiki ([[phab:T266020|T266020]]) (duration: 01m 08s)
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2d8cb10f246904f1af07b019da270fd8dc7816fa}}: Set wgGEHelpPanelAskMentor to true for several wikis ([[phab:T272753|T272753]]) (duration: 01m 21s)
* 12:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5003.eqsin.wmnet
* 12:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4029.ulsfo.wmnet
* 11:56 vgutierrez: powercycle cp5003
* 11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3055.esams.wmnet
* 11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5009.eqsin.wmnet
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1080.eqiad.wmnet
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3054.esams.wmnet
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1079.eqiad.wmnet
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2032.codfw.wmnet
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
* 11:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5003.eqsin.wmnet
* 11:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5009.eqsin.wmnet
* 11:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4029.ulsfo.wmnet
* 11:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3055.esams.wmnet
* 11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3054.esams.wmnet
* 11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2032.codfw.wmnet
* 11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
* 11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1080.eqiad.wmnet
* 11:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1079.eqiad.wmnet
* 11:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4023.ulsfo.wmnet
* 11:22 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1404.eqiad.wmnet
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4023.ulsfo.wmnet
* 11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5008.eqsin.wmnet
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14301 and previous config saved to /var/cache/conftool/dbconfig/20210210-104649-root.json
* 10:42 vgutierrez: powercycle cp5008
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4028.ulsfo.wmnet
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5002.eqsin.wmnet
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4022.ulsfo.wmnet
* 10:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
* 10:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2030.codfw.wmnet
* 10:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet
* 10:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2029.codfw.wmnet
* 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1078.eqiad.wmnet
* 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1077.eqiad.wmnet
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 80%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14300 and previous config saved to /var/cache/conftool/dbconfig/20210210-103146-root.json
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5008.eqsin.wmnet
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4022.ulsfo.wmnet
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4028.ulsfo.wmnet
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet
* 10:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
* 10:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2030.codfw.wmnet
* 10:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2029.codfw.wmnet
* 10:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1078.eqiad.wmnet
* 10:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1077.eqiad.wmnet
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14299 and previous config saved to /var/cache/conftool/dbconfig/20210210-101642-root.json
* 10:16 moritzm: installing firejail security updates
* 10:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4021.ulsfo.wmnet
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14298 and previous config saved to /var/cache/conftool/dbconfig/20210210-100139-root.json
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14297 and previous config saved to /var/cache/conftool/dbconfig/20210210-100111-root.json
* 10:00 vgutierrez: power cycling cp4021
* 09:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5007.eqsin.wmnet
* 09:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5001.eqsin.wmnet
* 09:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4027.ulsfo.wmnet
* 09:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
* 09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
* 09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2027.codfw.wmnet
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14296 and previous config saved to /var/cache/conftool/dbconfig/20210210-094635-root.json
* 09:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14295 and previous config saved to /var/cache/conftool/dbconfig/20210210-094608-root.json
* 09:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1075.eqiad.wmnet
* 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1076.eqiad.wmnet
* 09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5001.eqsin.wmnet
* 09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5007.eqsin.wmnet
* 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4021.ulsfo.wmnet
* 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4027.ulsfo.wmnet
* 09:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
* 09:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
* 09:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
* 09:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2027.codfw.wmnet
* 09:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1076.eqiad.wmnet
* 09:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1075.eqiad.wmnet
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14294 and previous config saved to /var/cache/conftool/dbconfig/20210210-093132-root.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14293 and previous config saved to /var/cache/conftool/dbconfig/20210210-093104-root.json
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14292 and previous config saved to /var/cache/conftool/dbconfig/20210210-093011-root.json
* 09:23 vgutierrez: rolling restart of cp nodes to catch up on kernel upgrades
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14290 and previous config saved to /var/cache/conftool/dbconfig/20210210-091601-root.json
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 80%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14289 and previous config saved to /var/cache/conftool/dbconfig/20210210-091507-root.json
* 09:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 09:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 10%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14288 and previous config saved to /var/cache/conftool/dbconfig/20210210-090057-root.json
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14287 and previous config saved to /var/cache/conftool/dbconfig/20210210-090004-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14286 and previous config saved to /var/cache/conftool/dbconfig/20210210-084500-root.json
* 08:41 legoktm: depooling mw1404.eqiad.wmnet for perf benchmarking ([[phab:T274041|T274041]])
* 08:41 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14285 and previous config saved to /var/cache/conftool/dbconfig/20210210-082957-root.json
* 08:19 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - [[phab:T272836|T272836]]
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14284 and previous config saved to /var/cache/conftool/dbconfig/20210210-081453-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14283 and previous config saved to /var/cache/conftool/dbconfig/20210210-080512-marostegui.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1170:3312, db1170:3317 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14282 and previous config saved to /var/cache/conftool/dbconfig/20210210-064330-marostegui.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1170:3312, db1170:3317 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14281 and previous config saved to /var/cache/conftool/dbconfig/20210210-063534-marostegui.json
* 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1170:3312, db1170:3317 with minimal weight for the first time [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14279 and previous config saved to /var/cache/conftool/dbconfig/20210210-061924-marostegui.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1170:3312 and db1170:3317 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14278 and previous config saved to /var/cache/conftool/dbconfig/20210210-061638-marostegui.json
* 06:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1020.eqiad.wmnet
* 06:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1020.eqiad.wmnet
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 to clone db1162 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14277 and previous config saved to /var/cache/conftool/dbconfig/20210210-055846-marostegui.json
* 03:46 ryankemper: `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service`
* 01:54 krinkle@deploy1001: Finished deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - {{Gerrit|Ib67da94fb1bdf0}} (duration: 00m 06s)
* 01:54 krinkle@deploy1001: Started deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - {{Gerrit|Ib67da94fb1bdf0}}
* 01:43 krinkle@deploy1001: Finished deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - {{Gerrit|Ibf28e02ec03}} (duration: 00m 06s)
* 01:43 krinkle@deploy1001: Started deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - {{Gerrit|Ibf28e02ec03}}
* 01:06 milimetric@deploy1001: Finished deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade (duration: 00m 06s)
* 01:06 milimetric@deploy1001: Started deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade
* 01:06 milimetric@deploy1001: Finished deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade (duration: 10m 55s)
* 00:58 mutante: doc1001 - reloaded apache2
* 00:55 milimetric@deploy1001: Started deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade
* 00:42 Amir1: changing frwiki to wmf.30 in mwdebug1002 to test [[phab:T264391|T264391]]
* 00:33 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/FeaturedFeeds: [[gerrit:662965{{!}}Fix issues with recent caching update]] ([[phab:T264391|T264391]]) (duration: 01m 10s)
* 00:22 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.30 (duration: 24m 10s)
* 00:01 twentyafterfour: train status: wmf.28 and wmf.29 are undeployed.  wmf.27 is everywhere with the exception of testwikis which is at wmf.30 refs [[phab:T271344|T271344]]


== 2021-02-09 ==
== 2022-06-19 ==
* 23:58 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.30
* 10:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 23:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
* 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: depooled
* 23:55 ryankemper: Depooled `wdqs1005` - it's catching up on hours of lag
* 10:14 ayounsi@cumin1001: dbctl commit (dc=all): 'depool', diff saved to https://phabricator.wikimedia.org/P29910 and previous config saved to /var/cache/conftool/dbconfig/20220619-101436-ayounsi.json
* 23:55 twentyafterfour@deploy1001: Finished scap: (no justification provided) (duration: 08m 43s)
* 23:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2250.codfw.wmnet
* 23:50 mutante: mw1383,mw1385 - scap pull, php
* 23:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1296.eqiad.wmnet
* 23:47 twentyafterfour: running scap sync-world
* 23:47 twentyafterfour@deploy1001: Started scap: (no justification provided)
* 23:46 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 23:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1296.eqiad.wmnet
* 23:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1380.eqiad.wmnet
* 23:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1380.eqiad.wmnet
* 23:28 mutante: mw1380 - powercycling after it did not come back from normal reboot during reimaging
* 23:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1372.eqiad.wmnet
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1372.eqiad.wmnet
* 23:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2250.codfw.wmnet with reason: REIMAGE
* 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2250.codfw.wmnet with reason: REIMAGE
* 22:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1296.eqiad.wmnet with reason: REIMAGE
* 22:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1296.eqiad.wmnet with reason: REIMAGE
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1372.eqiad.wmnet with reason: REIMAGE
* 22:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1372.eqiad.wmnet with reason: REIMAGE
* 22:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2259.codfw.wmnet
* 22:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2259.codfw.wmnet
* 22:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1373.eqiad.wmnet
* 22:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1373.eqiad.wmnet
* 22:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
* 22:23 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
* 22:23 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GlobalWatchlist extension on testwiki ([[phab:T260862|T260862]]) (duration: 02m 51s)
* 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2259.codfw.wmnet with reason: REIMAGE
* 22:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1380.eqiad.wmnet with reason: REIMAGE
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2259.codfw.wmnet with reason: REIMAGE
* 21:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1373.eqiad.wmnet with reason: REIMAGE
* 21:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1380.eqiad.wmnet with reason: REIMAGE
* 21:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1373.eqiad.wmnet with reason: REIMAGE
* 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2260.codfw.wmnet
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1381.eqiad.wmnet
* 21:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1384.eqiad.wmnet
* 21:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2260.codfw.wmnet
* 21:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1384.eqiad.wmnet
* 21:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1381.eqiad.wmnet
* 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1298.eqiad.wmnet with reason: REIMAGE
* 21:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1298.eqiad.wmnet with reason: REIMAGE
* 21:10 elukey: Analytics Hadoop cluster upgrade completed
* 21:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2260.codfw.wmnet with reason: REIMAGE
* 21:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1381.eqiad.wmnet with reason: REIMAGE
* 21:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1384.eqiad.wmnet with reason: REIMAGE
* 21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2260.codfw.wmnet with reason: REIMAGE
* 21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1381.eqiad.wmnet with reason: REIMAGE
* 21:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1384.eqiad.wmnet with reason: REIMAGE
* 20:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1299.eqiad.wmnet
* 20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1299.eqiad.wmnet
* 20:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2263.codfw.wmnet
* 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1382.eqiad.wmnet
* 20:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1385.eqiad.wmnet
* 20:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1382.eqiad.wmnet
* 20:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1385.eqiad.wmnet
* 20:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2263.codfw.wmnet
* 20:21 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:13 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 20:12 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 20:12 otto@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - otto@cumin1001
* 20:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 20:11 otto@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - otto@cumin1001
* 20:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 20:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1299.eqiad.wmnet with reason: REIMAGE
* 20:08 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 20:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1299.eqiad.wmnet with reason: REIMAGE
* 20:06 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 20:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1385.eqiad.wmnet with reason: REIMAGE
* 20:00 twentyafterfour: prepping 1.36.0-wmf.30
* 20:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1382.eqiad.wmnet with reason: REIMAGE
* 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1385.eqiad.wmnet with reason: REIMAGE
* 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2263.codfw.wmnet with reason: REIMAGE
* 19:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1382.eqiad.wmnet with reason: REIMAGE
* 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2263.codfw.wmnet with reason: REIMAGE
* 19:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2264.codfw.wmnet
* 19:35 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1383.eqiad.wmnet
* 19:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 19:23 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 19:21 ryankemper: [[phab:T262211|T262211]] `sudo cumin 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` on `ryankemper@cumin1001`
* 19:19 ryankemper: [[phab:T262211|T262211]] Attempting to bring `relforge100[3,4]` into service; merging https://gerrit.wikimedia.org/r/661229
* 19:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1300.eqiad.wmnet
* 19:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2220.codfw.wmnet
* 19:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 19:08 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 19:04 elukey@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - elukey@cumin1001
* 19:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - elukey@cumin1001
* 19:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 19:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1383.eqiad.wmnet
* 19:01 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 19:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2264.codfw.wmnet
* 18:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:57 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:46 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:45 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:42 ryankemper: [[phab:T267927|T267927]] [WDQS Data Reload] `sudo cookbook sre.wdqs.data-reload wdqs1010.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --task-id [[phab:T267927|T267927]]` on `ryankemper@cumin1001` tmux session `wdqs_data_reload_1010`
* 18:41 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:40 ryankemper: [[phab:T267927|T267927]] [WDQS Data Reload] `sudo cookbook sre.wdqs.data-reload wdqs1009.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --skolemize --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --task-id [[phab:T267927|T267927]]` on `ryankemper@cumin1001` tmux session `wdqs_data_reload_1009`
* 18:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 18:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 18:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:37 ryankemper: [[phab:T267927|T267927]] [WDQS Data Reload] Clearing old wikidata journal file to free disk space before beginning data reload:`sudo systemctl status wdqs-blazegraph && sudo systemctl stop wdqs-blazegraph && sudo rm -fv /srv/wdqs/wikidata.jnl && sudo systemctl start wdqs-blazegraph` on `wdqs100[9,10]`
* 18:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1300.eqiad.wmnet
* 18:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2220.codfw.wmnet
* 18:32 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:29 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 18:14 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 17:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 17:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 17:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1300.eqiad.wmnet with reason: REIMAGE
* 17:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2220.codfw.wmnet with reason: REIMAGE
* 17:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1300.eqiad.wmnet with reason: REIMAGE
* 17:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2220.codfw.wmnet with reason: REIMAGE
* 17:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
* 17:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
* 17:01 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 16:47 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.29
* 16:21 moritzm: installing wireshark security updates
* 16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - [[phab:T272836|T272836]]
* 16:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:59 volker-e@deploy1001: Finished deploy [design/style-guide@b9b7ee6]: Deploy design/style-guide: {{Gerrit|b9b7ee6}} “Components”: Fix components overview SVG rendering glitch (#439) (duration: 00m 07s)
* 15:59 volker-e@deploy1001: Started deploy [design/style-guide@b9b7ee6]: Deploy design/style-guide: {{Gerrit|b9b7ee6}} “Components”: Fix components overview SVG rendering glitch (#439)
* 15:32 papaul: power down logstash2035 for relocation
* 15:23 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 95 hosts with reason: upgrading openstack
* 15:22 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 95 hosts with reason: upgrading openstack
* 15:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 95 hosts with reason: upgrading openstack
* 15:22 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
* 15:22 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
* 15:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 95 hosts with reason: upgrading openstack
* 15:15 papaul: power down mw2220  for maintenance
* 15:11 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.29 (duration: 01m 11s)
* 15:10 moritzm: readding ganeti5002 to the eqsin Ganeti cluster following mainboard replacement/reinstall [[phab:T261130|T261130]]
* 15:10 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.29
* 15:06 hashar@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/FeaturedFeeds: Revert "Caching fixes" [[phab:T264391|T264391]] (duration: 01m 25s)
* 14:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading                  openstack
* 14:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading                  openstack
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14270 and previous config saved to /var/cache/conftool/dbconfig/20210209-145206-root.json
* 14:50 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2001.codfw.wmnet
* 14:48 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host pybal-test2001.codfw.wmnet
* 14:43 gehel: rebooting wdqs1009 / 1010 for kernel upgrade
* 14:37 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.36.0-wmf.29"
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 85%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14269 and previous config saved to /var/cache/conftool/dbconfig/20210209-143703-root.json
* 14:29 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.29 (duration: 01m 06s)
* 14:28 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.29
* 14:26 volans: cd /srv/external-monitoring; git fetch/status/pull on wikitech-static - [[phab:T273951|T273951]]
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14268 and previous config saved to /var/cache/conftool/dbconfig/20210209-142159-root.json
* 14:21 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.29
* 14:14 gehel: depooling wdqs1005, catching up on lag
* 14:10 hashar@deploy1001: Synchronized php-1.36.0-wmf.29/includes/libs/objectcache/wancache/WANObjectCache.php: WANObjectCache: throw on Closure - [[phab:T273242|T273242]] (duration: 01m 08s)
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 60%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14267 and previous config saved to /var/cache/conftool/dbconfig/20210209-140655-root.json
* 13:52 Urbanecm: Deploy security patch ([[phab:T274152|T274152]])
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14266 and previous config saved to /var/cache/conftool/dbconfig/20210209-135152-root.json
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 40%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14265 and previous config saved to /var/cache/conftool/dbconfig/20210209-133648-root.json
* 13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 30%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14264 and previous config saved to /var/cache/conftool/dbconfig/20210209-132145-root.json
* 13:08 twentyafterfour: restart phabricator daemons to free 3.5gb of ram (memory leak?)
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14263 and previous config saved to /var/cache/conftool/dbconfig/20210209-130641-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 20%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14262 and previous config saved to /var/cache/conftool/dbconfig/20210209-125138-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 15%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14261 and previous config saved to /var/cache/conftool/dbconfig/20210209-123634-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 13%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14260 and previous config saved to /var/cache/conftool/dbconfig/20210209-122131-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14259 and previous config saved to /var/cache/conftool/dbconfig/20210209-120627-root.json
* 12:05 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 12:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop analytics cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2010.codfw.wmnet
* 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2008.codfw.wmnet
* 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2006.codfw.wmnet
* 11:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2005.codfw.wmnet
* 11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
* 11:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1010.eqiad.wmnet
* 11:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1008.eqiad.wmnet
* 11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1007.eqiad.wmnet
* 11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1006.eqiad.wmnet
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 8%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14258 and previous config saved to /var/cache/conftool/dbconfig/20210209-115124-root.json
* 11:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
* 11:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1001.eqiad.wmnet
* 11:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
* 11:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 5%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14257 and previous config saved to /var/cache/conftool/dbconfig/20210209-113620-root.json
* 11:34 elukey: start the upgrade process for Hadoop Analytics
* 11:33 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop analytics cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
* 11:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
* 11:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 4%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14256 and previous config saved to /var/cache/conftool/dbconfig/20210209-112116-root.json
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
* 11:17 vgutierrez: rolling restart of eqiad LVS instances to catch up on kernel upgrades
* 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 3%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14255 and previous config saved to /var/cache/conftool/dbconfig/20210209-110613-root.json
* 11:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
* 10:57 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 10:57 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 10:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
* 10:53 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2001.codfw.wmnet
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 2%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14254 and previous config saved to /var/cache/conftool/dbconfig/20210209-105109-root.json
* 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
* 10:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
* 10:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
* 10:41 vgutierrez: rolling restart of esams LVS instances to catch up on kernel upgrades
* 10:40 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2001.codfw.wmnet
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 100%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14253 and previous config saved to /var/cache/conftool/dbconfig/20210209-103443-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 100%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14252 and previous config saved to /var/cache/conftool/dbconfig/20210209-103414-root.json
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1157 for the first time in s3 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14251 and previous config saved to /var/cache/conftool/dbconfig/20210209-102109-marostegui.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 75%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14250 and previous config saved to /var/cache/conftool/dbconfig/20210209-101939-root.json
* 10:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1019.eqiad.wmnet
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 75%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14249 and previous config saved to /var/cache/conftool/dbconfig/20210209-101911-root.json
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1157 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14248 and previous config saved to /var/cache/conftool/dbconfig/20210209-101556-marostegui.json
* 10:13 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1019.eqiad.wmnet
* 10:12 gehel@cumin1001: START - Cookbook sre.wdqs.reboot
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 50%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14247 and previous config saved to /var/cache/conftool/dbconfig/20210209-100436-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 50%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14246 and previous config saved to /var/cache/conftool/dbconfig/20210209-100407-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 25%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14245 and previous config saved to /var/cache/conftool/dbconfig/20210209-094932-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 25%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14244 and previous config saved to /var/cache/conftool/dbconfig/20210209-094904-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 10%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14243 and previous config saved to /var/cache/conftool/dbconfig/20210209-093429-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 10%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14242 and previous config saved to /var/cache/conftool/dbconfig/20210209-093400-root.json
* 09:22 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - [[phab:T272836|T272836]]
* 08:44 XioNoX: repool esams - [[phab:T272342|T272342]]
* 08:30 XioNoX: rollback redirect ns2 to authdns1001 - [[phab:T252631|T252631]]
* 08:09 XioNoX: alright, brace yourself, esams switch stack is going to go down
* 08:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 32 hosts with reason: switch upgrade
* 08:02 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on 32 hosts with reason: switch upgrade
* 07:54 XioNoX: redirect ns2 to authdns1001 - [[phab:T252631|T252631]]
* 07:47 hashar@deploy1001: Finished deploy [integration/docroot@672e79f]: build: Add /scap/log to gitignore (duration: 00m 06s)
* 07:47 hashar@deploy1001: Started deploy [integration/docroot@672e79f]: build: Add /scap/log to gitignore
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1081 from dbctl [[phab:T273040|T273040]]', diff saved to https://phabricator.wikimedia.org/P14241 and previous config saved to /var/cache/conftool/dbconfig/20210209-073455-marostegui.json
* 07:20 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14240 and previous config saved to /var/cache/conftool/dbconfig/20210209-072038-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14239 and previous config saved to /var/cache/conftool/dbconfig/20210209-070534-root.json
* 07:04 XioNoX: depool disable 2 uplinks on asw2-esams - [[phab:T272342|T272342]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14238 and previous config saved to /var/cache/conftool/dbconfig/20210209-065031-root.json
* 06:48 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 06:48 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 06:48 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 06:47 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@582b070]: 0.3.63 (duration: 06m 46s)
* 06:44 XioNoX: depool esams for network maintenance - [[phab:T272342|T272342]]
* 06:41 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.63` on canary `wdqs1003`; proceeding to rest of fleet
* 06:40 ryankemper@deploy1001: Started deploy [wdqs/wdqs@582b070]: 0.3.63
* 06:40 ryankemper: Pooled `wdqs1007` and depooled `wdqs1005` (`1005` is ~12 hours behind)
* 06:38 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.63`. Pre-deploy tests passing on canary `wdqs1003`
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14237 and previous config saved to /var/cache/conftool/dbconfig/20210209-063527-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14236 and previous config saved to /var/cache/conftool/dbconfig/20210209-062024-root.json
* 06:20 marostegui: Stop mysql on s2 and s7 on db1090 to clone db1170 [[phab:T258361|T258361]]
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14234 and previous config saved to /var/cache/conftool/dbconfig/20210209-061822-marostegui.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14233 and previous config saved to /var/cache/conftool/dbconfig/20210209-060520-root.json
* 05:02 krinkle@deploy1001: Finished deploy [integration/docroot@fdfb265]: {{Gerrit|I271e6054880}}, [[phab:T273247|T273247]] (duration: 00m 06s)
* 05:02 krinkle@deploy1001: Started deploy [integration/docroot@fdfb265]: {{Gerrit|I271e6054880}}, [[phab:T273247|T273247]]
* 01:56 tstarling@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/FeaturedFeeds: probable fix for UBN [[phab:T273242|T273242]] (duration: 01m 06s)
* 01:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1302.eqiad.wmnet
* 01:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1301.eqiad.wmnet
* 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1302.eqiad.wmnet
* 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1301.eqiad.wmnet
* 00:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1387.eqiad.wmnet
* 00:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1386.eqiad.wmnet
* 00:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1386.eqiad.wmnet
* 00:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1387.eqiad.wmnet
* 00:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1301.eqiad.wmnet with reason: REIMAGE
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1302.eqiad.wmnet with reason: REIMAGE


== 2021-02-08 ==
== 2022-06-17 ==
* 23:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1301.eqiad.wmnet with reason: REIMAGE
* 22:05 AndyRussG: update payments-wiki revision {{Gerrit|10304f69}} -> {{Gerrit|ef53c82e}}
* 23:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1302.eqiad.wmnet with reason: REIMAGE
* 20:22 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P29908 and previous config saved to /var/cache/conftool/dbconfig/20220617-202240-jynus.json
* 23:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2220.codfw.wmnet with reason: [[phab:T273803|T273803]]
* 20:20 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P29907 and previous config saved to /var/cache/conftool/dbconfig/20220617-202038-jynus.json
* 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2220.codfw.wmnet with reason: [[phab:T273803|T273803]]
* 17:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1021.eqiad.wmnet with OS buster
* 23:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2220.codfw.wmnet
* 17:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 23:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2220.codfw.wmnet
* 17:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
* 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1386.eqiad.wmnet with reason: REIMAGE
* 16:49 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1020.eqiad.wmnet with OS buster
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1386.eqiad.wmnet with reason: REIMAGE
* 16:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS buster
* 23:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1387.eqiad.wmnet with reason: REIMAGE
* 16:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1019.eqiad.wmnet with OS buster
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1387.eqiad.wmnet with reason: REIMAGE
* 16:37 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 23:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1388.eqiad.wmnet
* 16:35 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 23:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1303.eqiad.wmnet
* 16:34 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 23:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2245.codfw.wmnet
* 16:34 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 23:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1397.eqiad.wmnet
* 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
* 22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1274.eqiad.wmnet
* 16:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1273.eqiad.wmnet
* 16:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1272.eqiad.wmnet
* 16:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1271.eqiad.wmnet
* 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2245.codfw.wmnet
* 16:22 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
* 22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1388.eqiad.wmnet
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS buster
* 22:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1303.eqiad.wmnet
* 16:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 22:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1397.eqiad.wmnet
* 16:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1274.eqiad.wmnet
* 16:06 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1273.eqiad.wmnet
* 16:06 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
* 22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1271.eqiad.wmnet
* 16:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 21:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1303.eqiad.wmnet with reason: REIMAGE
* 15:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1303.eqiad.wmnet with reason: REIMAGE
* 15:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
* 21:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2245.codfw.wmnet with reason: REIMAGE
* 15:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
* 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2245.codfw.wmnet with reason: REIMAGE
* 15:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
* 21:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
* 15:51 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 21:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
* 15:46 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS buster
* 21:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1388.eqiad.wmnet with reason: REIMAGE
* 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
* 21:29 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1273.eqiad.wmnet with reason: reimaging
* 15:39 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1019.eqiad.wmnet with OS buster
* 21:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1273.eqiad.wmnet with reason: reimaging
* 15:39 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS buster
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1271.eqiad.wmnet with reason: reimaging
* 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 21:28 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1271.eqiad.wmnet with reason: reimaging
* 15:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1274.eqiad.wmnet with reason: REIMAGE
* 15:32 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 21:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1388.eqiad.wmnet with reason: REIMAGE
* 15:31 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS buster
* 21:25 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1272.eqiad.wmnet with reason: REIMAGE
* 15:29 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 21:25 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1274.eqiad.wmnet with reason: REIMAGE
* 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
* 21:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1304.eqiad.wmnet
* 15:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
* 21:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1271.eqiad.wmnet with reason: REIMAGE
* 15:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
* 21:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1273.eqiad.wmnet with reason: REIMAGE
* 15:19 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 21:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1271.eqiad.wmnet with reason: REIMAGE
* 15:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti4004.mgmt.ulsfo.wmnet with reboot policy GRACEFUL
* 21:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1273.eqiad.wmnet with reason: REIMAGE
* 15:18 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1272.eqiad.wmnet with reason: REIMAGE
* 15:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 21:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1305.eqiad.wmnet
* 15:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 21:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1304.eqiad.wmnet
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS buster
* 21:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1305.eqiad.wmnet
* 15:16 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 21:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1389.eqiad.wmnet
* 15:15 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS buster
* 21:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1389.eqiad.wmnet
* 15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
* 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1390.eqiad.wmnet
* 15:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
* 21:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1390.eqiad.wmnet
* 15:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS buster
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1304.eqiad.wmnet with reason: REIMAGE
* 15:02 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 20:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1304.eqiad.wmnet with reason: REIMAGE
* 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
* 20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1305.eqiad.wmnet with reason: REIMAGE
* 14:59 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1305.eqiad.wmnet with reason: REIMAGE
* 14:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
* 20:20 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1389.eqiad.wmnet with reason: REIMAGE
* 14:46 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 20:17 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Undo migration of SpecialMuteSubmit on all wikis except testwiki - [[phab:T268517|T268517]] (duration: 01m 06s)
* 14:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
* 20:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1389.eqiad.wmnet with reason: REIMAGE
* 14:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:16 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:24 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1390.eqiad.wmnet with reason: REIMAGE
* 12:35 SandraEbele: deployed daily airflow dag for 3 Wikidata metrics.
* 20:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1390.eqiad.wmnet with reason: REIMAGE
* 11:54 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@18182aa]: (no justification provided) (duration: 00m 13s)
* 20:11 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:54 ebysans@deploy1002: Started deploy [airflow-dags/analytics@18182aa]: (no justification provided)
* 19:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1391.eqiad.wmnet
* 11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2012.codfw.wmnet
* 19:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 11:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2012.codfw.wmnet
* 19:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 11:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2011.codfw.wmnet
* 19:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 11:40 moritzm: upload cas 6.5.5+wmf11u1 to apt.wikimedia.org [[phab:T305518|T305518]]
* 19:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1391.eqiad.wmnet
* 11:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2011.codfw.wmnet
* 19:48 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ca9bba1]: cirrus_namespace_map: only overwrite on success (duration: 01m 19s)
* 11:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2010.codfw.wmnet
* 19:47 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ca9bba1]: cirrus_namespace_map: only overwrite on success
* 11:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 19:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 11:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 19:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 11:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 19:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 11:33 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 19:20 urbanecm@deploy1001: Synchronized wmf-config/config/dawiki.yaml: {{Gerrit|3f39eefaa4c0dabfbc5b03fdc1b12e48913089bd}}: Enable GrowthExperiments at dawiki ([[phab:T256126|T256126]]; 3/3) (duration: 01m 04s)
* 11:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 19:18 urbanecm@deploy1001: Synchronized dblists/growthexperiments.dblist: {{Gerrit|3f39eefaa4c0dabfbc5b03fdc1b12e48913089bd}}: Enable GrowthExperiments at dawiki ([[phab:T256126|T256126]]; 2/3) (duration: 01m 03s)
* 11:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 19:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f39eefaa4c0dabfbc5b03fdc1b12e48913089bd}}: Enable GrowthExperiments at dawiki ([[phab:T256126|T256126]]) (duration: 01m 05s)
* 11:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2010.codfw.wmnet
* 19:13 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1012.eqiad.wmnet
* 19:11 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1012.eqiad.wmnet
* 19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1391.eqiad.wmnet with reason: REIMAGE
* 11:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1011.eqiad.wmnet
* 19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1391.eqiad.wmnet with reason: REIMAGE
* 11:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1011.eqiad.wmnet
* 19:08 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1010.eqiad.wmnet
* 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3e94e2177b7f31bea1c6bc21b272a4529a38b4b3}}: Make DiscussionTools newtopictool available on testwiki (duration: 01m 07s)
* 11:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1010.eqiad.wmnet
* 18:52 mutante: mw1391 - reimaging
* 10:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2037.codfw.wmnet
* 10:35 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 18:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@a458845]: Add trwikivoyage [[phab:T271262|T271262]] and restore restbase2009 (duration: 17m 13s)
* 10:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2037.codfw.wmnet
* 10:34 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 18:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2036.codfw.wmnet
* 10:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 18:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2036.codfw.wmnet
* 10:32 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 18:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2035.codfw.wmnet
* 10:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet
* 18:16 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2035.codfw.wmnet
* 09:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet
* 18:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2034.codfw.wmnet
* 09:56 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:12 ppchelko@deploy1001: Started deploy [restbase/deploy@a458845]: Add trwikivoyage [[phab:T271262|T271262]] and restore restbase2009
* 09:56 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 18:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2034.codfw.wmnet
* 09:55 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2033.codfw.wmnet
* 09:55 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 18:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2033.codfw.wmnet
* 09:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 17:57 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
* 09:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 17:57 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet
* 17:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2032.codfw.wmnet
* 09:44 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet
* 17:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2032.codfw.wmnet
* 09:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet
* 17:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2031.codfw.wmnet
* 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1004.eqiad.wmnet
* 17:31 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2031.codfw.wmnet
* 09:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet
* 17:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2030.codfw.wmnet
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1004.eqiad.wmnet
* 17:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2030.codfw.wmnet
* 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
* 17:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2029.codfw.wmnet
* 09:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet
* 17:23 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings - Add eventgate-analytics-external - [[phab:T272863|T272863]] (no-op) (duration: 01m 06s)
* 09:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
* 17:21 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: ProductionServices - Add eventgate-analytics-external - [[phab:T272863|T272863]] (no-op) (duration: 01m 06s)
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
* 17:20 otto@deploy1001: sync-file aborted: ProductionServices - Add eventgate-analytics-external - [[phab:T272863|T272863]] (no-op) (duration: 00m 02s)
* 09:24 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet
* 17:20 otto@deploy1001: Synchronized wmf-config/LabsServices.php: LabsServices - Add eventgate-analytics-external - [[phab:T272998|T272998]] (duration: 01m 08s)
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
* 17:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2029.codfw.wmnet
* 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 17:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2027.codfw.wmnet
* 09:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 17:12 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2027.codfw.wmnet
* 09:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 17:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2026.codfw.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
* 17:06 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2026.codfw.wmnet
* 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
* 16:30 XioNoX: adding option-82 to all prod vlans DHCP - [[phab:T269855|T269855]]
* 09:11 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 16:02 Urbanecm: Deploy security patch ([[phab:T71367|T71367]])
* 09:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 15:49 gehel: repool wdqs1012 - catched up on lag
* 09:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 15:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:58 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 15:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 08:51 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 15:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
* 08:47 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 15:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on maps1001.eqiad.wmnet with reason: Server being relocated
* 08:39 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 15:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on maps1001.eqiad.wmnet with reason: Server being relocated
* 08:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 15:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
* 08:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 15:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
* 08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti4004.ulsfo.wmnet with reason: Enable virt in BIOS
* 15:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 08:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
* 15:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
* 08:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
* 15:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1002.wikimedia.org
* 08:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
* 15:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
* 15:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1002.wikimedia.org
* 07:41 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 15:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 07:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-staging-ctrl[2001-2002].codfw.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 15:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1001.wikimedia.org
* 02:51 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS bullseye
* 15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1001.wikimedia.org
* 02:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 15:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2004.wikimedia.org
* 02:36 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
* 15:13 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Wikibase/build/travis/install.sh: Backport: [[gerrit:662669{{!}}Fix Travis CI build on release branches]] (prod no-op, syncing only to avoid drift) (duration: 01m 08s)
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 15:11 ottomata: set kafka topic retention to 31 days for (eqiad{{!}}codfw.rdf-streaming-updater.mutation) in kafka main-eqiad and main-codfw - [[phab:T269619|T269619]]
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 15:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2004.wikimedia.org
* 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2003.wikimedia.org
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 15:04 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1001.eqiad.wmnet
* 02:06 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 03m 43s)
* 15:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on maps1001.eqiad.wmnet with reason: Server being relocated
* 02:02 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS bullseye
* 15:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on maps1001.eqiad.wmnet with reason: Server being relocated
* 01:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS bullseye
* 15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2003.wikimedia.org
* 01:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 14:50 herron: stopped ES on logstash1020 in prep for re-rack [[phab:T273984|T273984]]
* 01:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
* 14:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
* 01:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS bullseye
* 14:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
* 00:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2026.codfw.wmnet
* 00:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
* 00:39 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
* 00:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
* 14:31 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2026.codfw.wmnet
* 14:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2024.codfw.wmnet
* 14:17 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
* 14:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2023.codfw.wmnet
* 14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2023.codfw.wmnet
* 14:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2022.codfw.wmnet
* 14:08 Urbanecm: Deploy security patch for [[phab:T223654|T223654]]
* 14:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2022.codfw.wmnet
* 14:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2021.codfw.wmnet
* 13:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2021.codfw.wmnet
* 13:54 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - [[phab:T272836|T272836]]
* 13:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5001.eqsin.wmnet
* 13:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5001.eqsin.wmnet
* 13:09 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/SyntaxHighlight_GeSHi/modules/pygments.wrapper.less: [[gerrit:662668{{!}}Move position:relative to inner wrapper]] ([[phab:T272853|T272853]]) (duration: 01m 08s)
* {{safesubst:SAL entry|1=13:06 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Wikibase/repo/includes/Store/Sql/SqlChangeDispatchCoordinator.php: [[gerrit:662666{{!}}Cast chd_seen as signed integer (duration: 01m 10s)}}
* 12:55 daniel@deploy1001: Synchronized php-1.36.0-wmf.29/includes/libs/objectcache/wancache/WANObjectCache.php: Backport: [[gerrit:662065{{!}}objectcache: Log more info when WANObjectCache async refresh fails]] ([[phab:T264391]]) (duration: 01m 07s)
* 12:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:22 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5002.eqsin.wmnet
* 12:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5002.eqsin.wmnet
* 12:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5003.eqsin.wmnet
* 12:07 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] rename ores_articletopics -> weighted_tags (duration: 01m 07s)
* 12:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5003.eqsin.wmnet
* 12:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2007.codfw.wmnet
* 11:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2007.codfw.wmnet
* 11:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2008.codfw.wmnet
* 11:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2008.codfw.wmnet
* 11:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2009.codfw.wmnet
* 11:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:662657{{!}} Bumping portals to master (T128546)]] (duration: 01m 07s)
* 11:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:662657{{!}} Bumping portals to master (T128546)]] (duration: 01m 07s)
* 11:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2020.codfw.wmnet
* 11:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2009.codfw.wmnet
* 11:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2020.codfw.wmnet
* 11:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2019.codfw.wmnet
* 11:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2010.codfw.wmnet
* 11:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
* 11:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
* 11:25 Urbanecm: Deploy security patch for [[phab:T71617|T71617]]
* 11:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet
* 11:23 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 11:23 hnowlan: resyncing postgres on maps1005
* 11:22 hnowlan: resyncing postgres on maps1001
* 11:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2010.codfw.wmnet
* 11:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4005.ulsfo.wmnet
* 11:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4005.ulsfo.wmnet
* 11:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4006.ulsfo.wmnet
* 11:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4006.ulsfo.wmnet
* 11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet
* 10:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet
* 10:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2025.codfw.wmnet
* 10:05 moritzm: updating netboot images to Buster 10.8  [[phab:T274099|T274099]]
* 10:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2025.codfw.wmnet
* 09:43 XioNoX: failover pfw3-eqiad RG1 to node 0 [[phab:T263833|T263833]]
* 09:42 marostegui: Stop MySQL on db1111 [[phab:T273982|T273982]]
* 09:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet
* 09:23 vgutierrez: restart varnish-fe on cp1087
* 09:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet
* 09:20 vgutierrez: rolling restart of LVS instances to catch up on kernel upgrades
* 09:00 gehel: depool and restart blazegraph on wdqs1005 / wdqs1012
* 08:56 XioNoX: push pfw policies [[phab:T273989|T273989]]
* 08:33 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 [[phab:T273982|T273982]]', diff saved to https://phabricator.wikimedia.org/P14229 and previous config saved to /var/cache/conftool/dbconfig/20210208-070858-marostegui.json
* 06:50 effie: Removed mc1024 from mcrouter, some resharding is expected
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1094 from dbctl [[phab:T273710|T273710]]', diff saved to https://phabricator.wikimedia.org/P14228 and previous config saved to /var/cache/conftool/dbconfig/20210208-061319-marostegui.json


== 2021-02-07 ==
== 2022-06-16 ==
* 22:58 Urbanecm: Reset password for TheresNoTime ([[phab:T274087|T274087]])
* 23:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
* 23:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 23:38 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
* 23:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
* 22:59 mutante: new Wikipedia languages added to DNS:  blk = https://en.wikipedia.org/wiki/Pa%27O_language  {{!}}  pcm = https://en.wikipedia.org/wiki/Nigerian_Pidgin
* 22:37 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:33 volans@cumin2002: START - Cookbook sre.dns.netbox
* 21:18 thcipriani@deploy1002: Finished scap: noop test (duration: 04m 07s)
* 21:14 thcipriani@deploy1002: Started scap: noop test
* 21:10 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:805433{{!}}CommonSettings: clean up and simplify some code]] (duration: 03m 42s)
* 21:06 thcipriani@deploy1002: Synchronized multiversion/MWRealm.php: Config: [[gerrit:806249{{!}}MWRealm.php: remove unused getRealmSpecificFilename() (T171115)]] (duration: 03m 35s)
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:59 thcipriani@deploy1002: Finished scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]] (duration: 16m 57s)
* 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:42 thcipriani@deploy1002: Started scap: Config: [[gerrit:806248{{!}}phpcs: enable PrefixedGlobalFunctions.allowedPrefix and rename functions (T171115)]]
* 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:27 cjming@deploy1002: Synchronized phpcs.xml: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 27s)
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:23 cjming@deploy1002: Synchronized wmf-config/: Config: [[gerrit:805432{{!}}phpcs: move SpaceBeforeSingleLineComment.NewLineComment exclusions (T171115)]] (duration: 03m 22s)
* 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 20:12 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805179{{!}}Turn off TOC A/B test for pilot wikis (T309683)]] (duration: 03m 37s)
* 19:39 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner2001.codfw.wmnet
* 19:39 aokoth@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:23 aokoth@cumin1001: START - Cookbook sre.dns.netbox
* 19:03 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner2001.codfw.wmnet
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab-runner1001.eqiad.wmnet
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29904 and previous config saved to /var/cache/conftool/dbconfig/20220616-185520-marostegui.json
* 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:54 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 18:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.eqiad.wmnet
* 18:53 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:50 dzahn@cumin2002: START - Cookbook sre.dns.netbox
* 18:49 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 18:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:44 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to all wikis
* 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:42 brennen@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/CheckUser/src/Hooks.php: Backport: [[gerrit:806246{{!}}Only try to create User object if username is not null (T310747)]] (duration: 03m 23s)
* 18:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29903 and previous config saved to /var/cache/conftool/dbconfig/20220616-184015-marostegui.json
* 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.eqiad.wmnet
* 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P29902 and previous config saved to /var/cache/conftool/dbconfig/20220616-182510-marostegui.json
* 18:13 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 18:12 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:11 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
* 18:10 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29901 and previous config saved to /var/cache/conftool/dbconfig/20220616-181005-marostegui.json
* 18:10 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
* 17:59 brennen: end of phabricator deploy
* 17:46 brennen: starting phabricator deploy, momentary downtime expected while services restart
* 17:42 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:42 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab.wmfusercontent.org with reason: bug fix
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29900 and previous config saved to /var/cache/conftool/dbconfig/20220616-173738-marostegui.json
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29899 and previous config saved to /var/cache/conftool/dbconfig/20220616-173725-marostegui.json
* 17:31 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:31 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: bug fix
* 17:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:27 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
* 17:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:26 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
* 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29898 and previous config saved to /var/cache/conftool/dbconfig/20220616-172220-marostegui.json
* 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P29897 and previous config saved to /var/cache/conftool/dbconfig/20220616-170715-marostegui.json
* 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29896 and previous config saved to /var/cache/conftool/dbconfig/20220616-165210-marostegui.json
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29895 and previous config saved to /var/cache/conftool/dbconfig/20220616-161844-marostegui.json
* 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
* 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29894 and previous config saved to /var/cache/conftool/dbconfig/20220616-161835-marostegui.json
* 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29893 and previous config saved to /var/cache/conftool/dbconfig/20220616-160330-marostegui.json
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P29892 and previous config saved to /var/cache/conftool/dbconfig/20220616-154825-marostegui.json
* 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29891 and previous config saved to /var/cache/conftool/dbconfig/20220616-153320-marostegui.json
* 15:31 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
* 15:30 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
* 15:30 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
* 15:29 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
* 15:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 15:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P29890 and previous config saved to /var/cache/conftool/dbconfig/20220616-151434-ladsgroup.json
* 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P29889 and previous config saved to /var/cache/conftool/dbconfig/20220616-145931-ladsgroup.json
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29888 and previous config saved to /var/cache/conftool/dbconfig/20220616-145136-marostegui.json
* 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29887 and previous config saved to /var/cache/conftool/dbconfig/20220616-145128-marostegui.json
* 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P29886 and previous config saved to /var/cache/conftool/dbconfig/20220616-144427-ladsgroup.json
* 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29885 and previous config saved to /var/cache/conftool/dbconfig/20220616-143623-marostegui.json
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 14:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P29884 and previous config saved to /var/cache/conftool/dbconfig/20220616-142923-ladsgroup.json
* 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P29883 and previous config saved to /var/cache/conftool/dbconfig/20220616-142118-marostegui.json
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29882 and previous config saved to /var/cache/conftool/dbconfig/20220616-140613-marostegui.json
* 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P29881 and previous config saved to /var/cache/conftool/dbconfig/20220616-140453-root.json
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:01 volans@cumin1001: dbctl commit (dc=all): 'Doesn't have new wikiuser', diff saved to https://phabricator.wikimedia.org/P29880 and previous config saved to /var/cache/conftool/dbconfig/20220616-140107-volans.json
* 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P29879 and previous config saved to /var/cache/conftool/dbconfig/20220616-134950-root.json
* 13:45 sukhe: upload bird2_2.0.7-4.1wm1 to apt.wm.o (buster) - [[phab:T310574|T310574]]
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29878 and previous config saved to /var/cache/conftool/dbconfig/20220616-133446-root.json
* 13:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp1089.eqiad.wmnet
* 13:22 jayme@cumin1001: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
* 13:21 jayme@cumin1001: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29877 and previous config saved to /var/cache/conftool/dbconfig/20220616-131942-root.json
* 13:10 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
* 13:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4004.ulsfo.wmnet to ganeti01.svc.ulsfo.wmnet
* 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4004.ulsfo.wmnet
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29876 and previous config saved to /var/cache/conftool/dbconfig/20220616-130438-root.json
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-tls
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=varnish-fe
* 13:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet,service=ats-be
* 13:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4004.ulsfo.wmnet
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29875 and previous config saved to /var/cache/conftool/dbconfig/20220616-123357-marostegui.json
* 12:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1008.eqiad.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 for schema change', diff saved to https://phabricator.wikimedia.org/P29874 and previous config saved to /var/cache/conftool/dbconfig/20220616-115924-root.json
* 11:53 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1008.eqiad.wmnet
* 11:53 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1007.eqiad.wmnet
* 11:45 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1007.eqiad.wmnet
* 11:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
* 11:38 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
* 11:35 godog: trim swift logs older than 25d from centrallog hosts - [[phab:T309171|T309171]]
* 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on testvm[2001-2005].codfw.wmnet with reason: reboots
* 11:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1005.eqiad.wmnet
* 11:27 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1005.eqiad.wmnet
* 11:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
* 11:19 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29873 and previous config saved to /var/cache/conftool/dbconfig/20220616-111632-marostegui.json
* 11:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 11:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
* 11:09 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 11:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 11:02 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29871 and previous config saved to /var/cache/conftool/dbconfig/20220616-110127-marostegui.json
* 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
* 10:54 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
* 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic[1100-1102].eqiad.wmnet with reason: reboots
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P29870 and previous config saved to /var/cache/conftool/dbconfig/20220616-104622-marostegui.json
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
* 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
* 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
* 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboots
* 10:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboots
* 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1089.eqiad.wmnet
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
* 10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1089.eqiad.wmnet
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29869 and previous config saved to /var/cache/conftool/dbconfig/20220616-103117-marostegui.json
* 10:28 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:28 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]
* 10:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:21 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: Rebooting to activate new kernel for [[phab:T310483|T310483]]?
* 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS buster
* 10:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS buster
* 10:02 elukey: ran `scap install-world --batch` on deploy1002 to allow scap/puppet to work on ml-cache100[2,3]
* 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
* 09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
* 09:32 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS buster
* 09:21 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS buster
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29868 and previous config saved to /var/cache/conftool/dbconfig/20220616-091131-marostegui.json
* 09:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
* 09:02 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6002.drmrs.wmnet
* 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
* 08:45 moritzm: failover ganeti master in drmrs/2 to ganeti6004
* 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:805370{{!}}testwiki: Enable SectionTranslation for 11 Wikipedias (T309384 T310116)]] (duration: 03m 41s)
* 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 07:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 06:49 joal: Rerun webrequest-load-wf-upload-2022-6-15-22 after weird oozie failure


== 2021-02-06 ==
== 2022-06-15 ==
* 08:59 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29867 and previous config saved to /var/cache/conftool/dbconfig/20220615-224845-marostegui.json
* 08:58 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29866 and previous config saved to /var/cache/conftool/dbconfig/20220615-223339-marostegui.json
* 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 22:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS buster
* 08:52 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P29865 and previous config saved to /var/cache/conftool/dbconfig/20220615-221834-marostegui.json
* 03:40 ryankemper: Deleted dump taking up diskspace on `wdqs1009`, disk space warning will resolve now
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS buster
* 01:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1319.eqiad.wmnet
* 22:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 01:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet
* 22:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1016.eqiad.wmnet with OS buster
* 01:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1319.eqiad.wmnet
* 22:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS buster
* 01:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1313.eqiad.wmnet
* 22:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
* 01:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2265.codfw.wmnet
* 22:12 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1016.eqiad.wmnet with OS buster
* 00:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1366.eqiad.wmnet
* 22:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 00:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1366.eqiad.wmnet
* 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29864 and previous config saved to /var/cache/conftool/dbconfig/20220615-220329-marostegui.json
* 00:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2265.codfw.wmnet
* 22:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS buster
* 00:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 00:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE
* 22:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
* 00:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE
* 21:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 00:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29863 and previous config saved to /var/cache/conftool/dbconfig/20220615-213241-marostegui.json
* 00:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE
* 21:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 00:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE
* 21:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
* 00:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE
* 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29862 and previous config saved to /var/cache/conftool/dbconfig/20220615-213233-marostegui.json
* 00:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE
* 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29861 and previous config saved to /var/cache/conftool/dbconfig/20220615-211728-marostegui.json
* 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P29860 and previous config saved to /var/cache/conftool/dbconfig/20220615-210223-marostegui.json
* 20:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29859 and previous config saved to /var/cache/conftool/dbconfig/20220615-204717-marostegui.json
* 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 20:08 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804014{{!}}Remove unused setting wgQuickSurveysUseVue (T285890)]] (duration: 03m 38s)
* 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 19:50 hashar@deploy1002: Finished deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]] (duration: 00m 10s)
* 19:50 hashar@deploy1002: Started deploy [integration/docroot@b95391b]: Add Developer Portal - [[phab:T302809|T302809]]
* 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29858 and previous config saved to /var/cache/conftool/dbconfig/20220615-194703-marostegui.json
* 19:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 19:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
* 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29857 and previous config saved to /var/cache/conftool/dbconfig/20220615-194655-marostegui.json
* 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29856 and previous config saved to /var/cache/conftool/dbconfig/20220615-193150-marostegui.json
* 19:31 hashar: wikibugs IRC bot has been restarted by valhallasw \o/ # [[phab:T310734|T310734]]
* 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P29855 and previous config saved to /var/cache/conftool/dbconfig/20220615-191645-marostegui.json
* 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29854 and previous config saved to /var/cache/conftool/dbconfig/20220615-190140-marostegui.json
* 18:42 hashar: wikibugs (irc bot for Phabricator/Gerrit) is no more working and would need a restart [[phab:T310734|T310734]]
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29853 and previous config saved to /var/cache/conftool/dbconfig/20220615-182140-marostegui.json
* 18:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 18:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
* 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 18:10 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]] (duration: 03m 43s)
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 18:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:58 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1015.eqiad.wmnet with OS buster
* 17:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS buster
* 17:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
* 17:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
* 17:52 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1014.eqiad.wmnet with OS buster
* 17:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS buster
* 17:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host stat1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
* 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29851 and previous config saved to /var/cache/conftool/dbconfig/20220615-172738-marostegui.json
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1015.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29849 and previous config saved to /var/cache/conftool/dbconfig/20220615-171233-marostegui.json
* 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1014.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1013.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 17:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 17:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.16  refs [[phab:T308069|T308069]]
* 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P29848 and previous config saved to /var/cache/conftool/dbconfig/20220615-165727-marostegui.json
* 16:54 brennen: train 1.39.0-wmf.16 ([[phab:T308069|T308069]]): no current blockers - rolling to group0
* 16:44 jynus: reestarting replication for m3 on db1117, not db2078
* 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29847 and previous config saved to /var/cache/conftool/dbconfig/20220615-164222-marostegui.json
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1012.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1011.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1010.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1009.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:29 brennen: phabricator upgrade finished
* 16:27 krinkle@deploy1002: Synchronized multiversion/: {{Gerrit|Id8cdb8aef70f6672}} (duration: 03m 41s)
* 16:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 16:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 16:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host backup1009.eqiad.wmnet
* 16:21 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host backup1009.eqiad.wmnet
* 16:13 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1008.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1007.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-presto1006.mgmt.eqiad.wmnet with reboot policy FORCED
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1118 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29845 and previous config saved to /var/cache/conftool/dbconfig/20220615-160838-marostegui.json
* 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29844 and previous config saved to /var/cache/conftool/dbconfig/20220615-160830-marostegui.json
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS buster
* 15:56 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:53 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P29843 and previous config saved to /var/cache/conftool/dbconfig/20220615-155325-marostegui.json
* 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
* 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
* 15:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
* 15:40 mutante: phabricator upgrade in progress
* 15:39 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 15:39 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20220615-153820-marostegui.json
* 15:35 brennen: starting phabricator deploy, momentary downtime expected while Apache restarts and migrations run
* 15:34 jynus: stopping replication for m3 on db1117, db2078
* 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
* 15:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
* 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29841 and previous config saved to /var/cache/conftool/dbconfig/20220615-152315-marostegui.json
* 15:20 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host ms-be1059.eqiad.wmnet with OS bullseye
* 15:20 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:20 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phabricator.wikimedia.org with reason: maintenace
* 15:06 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 15:05 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:05 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1001.eqiad.wmnet with reason: maintenance
* 15:03 mutante: phabricator maintenance about to start
* 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
* 15:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 14:59 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
* 14:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 14:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1059.eqiad.wmnet with reason: host reimage
* 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.rotate-password (exit_code=0)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:53 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
* 14:53 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 14:52 jbond@cumin1001: END (ERROR) - Cookbook sre.pdus.uptime (exit_code=97)
* 14:51 jbond@cumin1001: START - Cookbook sre.pdus.uptime
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29840 and previous config saved to /var/cache/conftool/dbconfig/20220615-145028-marostegui.json
* 14:50 urandom: ALTER-ing replication for codfw (Cassandra) expansion -- [[phab:T307641|T307641]]
* 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29839 and previous config saved to /var/cache/conftool/dbconfig/20220615-145020-marostegui.json
* 14:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:49 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 14:46 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:46 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29838 and previous config saved to /var/cache/conftool/dbconfig/20220615-143515-marostegui.json
* 14:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:30 hnowlan@deploy1002: Synchronized private/PrivateSettings.php: [[phab:T308670|T308670]] credentials to access the similar-users service (duration: 03m 32s)
* 14:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
* 14:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:22 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:21 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P29836 and previous config saved to /var/cache/conftool/dbconfig/20220615-142010-marostegui.json
* 14:19 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:18 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
* 14:16 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS buster
* 14:10 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:09 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jnuche@deploy1002: Installation of scap version "4.9.4" completed for 558 hosts
* 14:08 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
* 14:08 jnuche@deploy1002: Installing scap version "4.9.4" for 558 hosts
* 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29834 and previous config saved to /var/cache/conftool/dbconfig/20220615-140505-marostegui.json
* 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 14:01 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
* 14:01 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
* 13:58 awight: EU afternoon backport window complete.
* 13:57 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/Translate/src/PageTranslation/DeleteTranslatableBundleSpecialPage.php: Backport: [[gerrit:805749{{!}}Fix deletion of translation pages outside of NS_MAIN namespace (T310440)]] (duration: 00m 32s)
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29833 and previous config saved to /var/cache/conftool/dbconfig/20220615-135508-root.json
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29832 and previous config saved to /var/cache/conftool/dbconfig/20220615-135502-root.json
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29831 and previous config saved to /var/cache/conftool/dbconfig/20220615-135458-root.json
* 13:54 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:53 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:51 ayounsi@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:49 ayounsi@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: deploy new homer wmf-netbox - ayounsi@cumin2002
* 13:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
* 13:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29830 and previous config saved to /var/cache/conftool/dbconfig/20220615-134004-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29829 and previous config saved to /var/cache/conftool/dbconfig/20220615-133958-root.json
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29828 and previous config saved to /var/cache/conftool/dbconfig/20220615-133954-root.json
* 13:38 awight@deploy1002: Synchronized php-1.39.0-wmf.16/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: [[gerrit:805745{{!}}Restore internal mechanism to use either back or close button (T310602)]] (duration: 00m 37s)
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29827 and previous config saved to /var/cache/conftool/dbconfig/20220615-133334-marostegui.json
* 13:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29826 and previous config saved to /var/cache/conftool/dbconfig/20220615-133326-marostegui.json
* 13:31 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 01m 08s)
* 13:30 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.2 (duration: 02m 06s)
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 13:27 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.2
* 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29825 and previous config saved to /var/cache/conftool/dbconfig/20220615-132500-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29824 and previous config saved to /var/cache/conftool/dbconfig/20220615-132454-root.json
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29823 and previous config saved to /var/cache/conftool/dbconfig/20220615-132450-root.json
* 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29822 and previous config saved to /var/cache/conftool/dbconfig/20220615-131820-marostegui.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29821 and previous config saved to /var/cache/conftool/dbconfig/20220615-130956-root.json
* 13:09 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 03s)
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29820 and previous config saved to /var/cache/conftool/dbconfig/20220615-130951-root.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29819 and previous config saved to /var/cache/conftool/dbconfig/20220615-130946-root.json
* 13:08 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:04 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v3.1 (duration: 01m 43s)
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P29818 and previous config saved to /var/cache/conftool/dbconfig/20220615-130315-marostegui.json
* 13:02 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v3.1
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox2002.codfw.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 13:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on netbox1002.eqiad.wmnet with reason: Netbox upgrade to 3.2
* 12:56 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 58s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:55 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: deploying v2.11.12 (duration: 00m 05s)
* 12:55 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: deploying v2.11.12
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29817 and previous config saved to /var/cache/conftool/dbconfig/20220615-125452-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29816 and previous config saved to /var/cache/conftool/dbconfig/20220615-125447-root.json
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29815 and previous config saved to /var/cache/conftool/dbconfig/20220615-125442-root.json
* 12:51 jbond@deploy1002: Finished deploy [netbox/deploy@7bbf659]: log (duration: 03m 12s)
* 12:48 jbond@deploy1002: Started deploy [netbox/deploy@7bbf659]: log
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29813 and previous config saved to /var/cache/conftool/dbconfig/20220615-124810-marostegui.json
* 12:42 moritzm: failover ganeti master in eqsin to ganeti5001
* 12:42 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:42 volans@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on netbox:443 with reason: Netbox upgrade to 3.2 [[phab:T296452|T296452]]
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29812 and previous config saved to /var/cache/conftool/dbconfig/20220615-123949-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29811 and previous config saved to /var/cache/conftool/dbconfig/20220615-123943-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29810 and previous config saved to /var/cache/conftool/dbconfig/20220615-123938-root.json
* 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
* 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
* 12:25 kart_: Updated cxserver to 2022-06-15-074244-production ([[phab:T309266|T309266]], [[phab:T310116|T310116]], [[phab:T309384|T309384]], [[phab:T306963|T306963]])
* 12:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 12:23 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 es1033 es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P29808 and previous config saved to /var/cache/conftool/dbconfig/20220615-122123-root.json
* 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 12:19 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 12:16 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29807 and previous config saved to /var/cache/conftool/dbconfig/20220615-121620-marostegui.json
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
* 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29806 and previous config saved to /var/cache/conftool/dbconfig/20220615-121440-marostegui.json
* 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
* 12:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
* 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29805 and previous config saved to /var/cache/conftool/dbconfig/20220615-115935-marostegui.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29804 and previous config saved to /var/cache/conftool/dbconfig/20220615-115452-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29803 and previous config saved to /var/cache/conftool/dbconfig/20220615-115135-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29802 and previous config saved to /var/cache/conftool/dbconfig/20220615-115127-root.json
* 11:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29801 and previous config saved to /var/cache/conftool/dbconfig/20220615-114950-marostegui.json
* 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P29800 and previous config saved to /var/cache/conftool/dbconfig/20220615-114430-marostegui.json
* 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29799 and previous config saved to /var/cache/conftool/dbconfig/20220615-113948-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29798 and previous config saved to /var/cache/conftool/dbconfig/20220615-113631-root.json
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29797 and previous config saved to /var/cache/conftool/dbconfig/20220615-113623-root.json
* 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29796 and previous config saved to /var/cache/conftool/dbconfig/20220615-113445-marostegui.json
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 ([[phab:T302659|T302659]])', diff saved to https://phabricator.wikimedia.org/P29795 and previous config saved to /var/cache/conftool/dbconfig/20220615-112924-marostegui.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29794 and previous config saved to /var/cache/conftool/dbconfig/20220615-112444-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29793 and previous config saved to /var/cache/conftool/dbconfig/20220615-112127-root.json
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29792 and previous config saved to /var/cache/conftool/dbconfig/20220615-112119-root.json
* 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P29791 and previous config saved to /var/cache/conftool/dbconfig/20220615-111940-marostegui.json
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29790 and previous config saved to /var/cache/conftool/dbconfig/20220615-110940-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29789 and previous config saved to /var/cache/conftool/dbconfig/20220615-110623-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29788 and previous config saved to /var/cache/conftool/dbconfig/20220615-110616-root.json
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29787 and previous config saved to /var/cache/conftool/dbconfig/20220615-110435-marostegui.json
* 10:55 marostegui: dbmaint es3@eqiad [[phab:T310485|T310485]]
* 10:55 marostegui: dbmaint es2@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui: dbmaint es1@eqiad [[phab:T310485|T310485]]
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29786 and previous config saved to /var/cache/conftool/dbconfig/20220615-105437-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29784 and previous config saved to /var/cache/conftool/dbconfig/20220615-105119-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29783 and previous config saved to /var/cache/conftool/dbconfig/20220615-105112-root.json
* 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
* 10:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
* 10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29782 and previous config saved to /var/cache/conftool/dbconfig/20220615-103933-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29781 and previous config saved to /var/cache/conftool/dbconfig/20220615-103615-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P29780 and previous config saved to /var/cache/conftool/dbconfig/20220615-103608-root.json
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P29779 and previous config saved to /var/cache/conftool/dbconfig/20220615-103101-marostegui.json
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,10